Sample-Rate Agnostic Recurrent Neural Networks

If you’re using neural networks to process signals in the time domain, you’ve probably run into the following problem: I trained my neural network on data at one sample rate, but now I want to use that network to process data at a different sample rate.

The Problem

As an example, I took an LSTM network that was trained at 96 kHz sample rate, and passed a 100 Hz sine wave through the network, first at 96 kHz, and then at 192 kHz (doubling 96 kHz). As shown below, the output at the higher sample rate does not match the expected output. In this case, the LSTM network was designed to process audio signals, and the difference between the output at the training sample rate and the higher sample rate would be significantly noticeable for the listener.

Processing 192 kHz signal through an LSTM network trained on 96 kHz data.

One potential solution would be to “resample” the signal to the training sample rate before processing it through the neural network, and then resampling back to the original sample rate afterwards. For most “offline” processing this solution works fine. However, for real-time applications, resampling the signal can add significant computational overhead, along with potentially adding latency to the signal, particularly since the signal may need to be resampled by a non-integer factor (for example, resampling from 44.1 kHz to 48 kHz in audio signals).

Another idea would be to train multiple neural networks for several commonly used sample rates, and use the the network with the training sample rate closest to the target sample rate. However, since training neural networks is a stochastic process, there is no guarantee that each network will produce the same result. Also, this solution would not work for well in situations where the neural network could be used at sample rates well outside the range of the training sample rates.

What I’d like to present here is a better solution for “adapting” recurrent neural networks to process signal at any sample rate, which applies to networks using LSTMs and GRUs.

The Solution

Signal flow for a neural network with a single recurrent layer.

The important thing to notice is how the network handles the state of the recurrent layer: delaying that state by one sample (z^-1). Recurrent neural networks are sample rate dependent because the delay time for the network state depends on the sample rate. For example, if the recurrent network is run at 48 kHz sample rate, the recurrent state is delayed by 0.02 milliseconds between processing steps. However, at 96 kHz, the delay time is reduced to 0.01 milliseconds.

The solution that I’m proposing is as follows: What if we delay the recurrent state by a different number of samples depending on the sample rate at which the network is being used?

Signal flow for the same RNN, with sample rate correction. F_target refers to the target sample rate, while F_training refers the the training sample rate.

Case: Integer Multiple Sample Rate

Processing signal through an LSTM network, using a target sample rate of 2 or 3 times the training sample rate, with sample rate correction.

Case: Non-Integer Multiple Sample Rate

Signal flow for the same RNN, with sample rate correction using a fractional delay line with linear interpolation.

While it may be possible to improve the output quality by using a higher-order interpolation method, the linear interpolation approach works pretty well in the test example that we’ve been using here.

Processing signal through an LSTM network, using a target sample rate that is a non-integer multiple of the training sample rate, with sample rate correction.

Limitations

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store