Sample-Rate Agnostic Recurrent Neural Networks

The Problem

For some types of neural networks, this problem isn’t really a problem at all. If you’re network is “memoryless,” like most networks made up of only fully-connected layers, then the neural network is already sample-rate agnostic. However, for “stateful” neural networks including convolutional or recurrent networks, processing signal at a different sample rate than the training data will lead to a wildly different result.

Processing 192 kHz signal through an LSTM network trained on 96 kHz data.

The Solution

Let’s start by examining the signal flow of a simple recurrent neural network:

Signal flow for a neural network with a single recurrent layer.
Signal flow for the same RNN, with sample rate correction. F_target refers to the target sample rate, while F_training refers the the training sample rate.

Case: Integer Multiple Sample Rate

For example, if the neural network is being run at a sample rate that is 2x the training sample rate, then we just need to delay the recurrent state by two samples instead of one. As we can see below, this solution works pretty well when the target sample is an integer multiple of the training sample rate.

Processing signal through an LSTM network, using a target sample rate of 2 or 3 times the training sample rate, with sample rate correction.

Case: Non-Integer Multiple Sample Rate

What if the target sample rate is not an integer multiple of the training sample rate? In that case, we need to use a fractional delay instead of an integer-sample delay. While there are several types of delay-line interpolation that can be used to create this fractional delay, for now we’ll stick with the simplest method: linear interpolation:

Signal flow for the same RNN, with sample rate correction using a fractional delay line with linear interpolation.
Processing signal through an LSTM network, using a target sample rate that is a non-integer multiple of the training sample rate, with sample rate correction.

Limitations

The limitation of the solution proposed here is that it only works when the target sample rate is larger than the training sample rate, since implementing a fractional delay for a delay time less than 1 sample would introduce a delay-free loop. However, a good workaround in this case would be to upsample the signal by an integer factor until it is at or above the training sample rate.

Conclusion

Here we’ve described one possible solution for constructing sample-rate agnostic neural networks. At the moment I’ve implemented this scheme in a couple of real-time audio effects, but I hope that these ideas will be useful outside of the audio domain as well. The code for generating the plots and other figures in this article can be found on GitHub.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store