Download Differentiable grey-box modelling of phaser effects using frame-based spectral processing Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Differentiable All-Pole Filters for Time-Varying Audio Systems Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .
Download Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning Neural networks have become ubiquitous with guitar distortion
effects modelling in recent years. Despite their ability to yield
perceptually convincing models, they are susceptible to frequency
aliasing when driven by high frequency and high gain inputs.
Nonlinear activation functions create both the desired harmonic
distortion and unwanted aliasing distortion as the bandwidth of
the signal is expanded beyond the Nyquist frequency. Here, we
present a method for reducing aliasing in neural models via a
teacher-student fine tuning approach, where the teacher is a pretrained model with its weights frozen, and the student is a copy of
this with learnable parameters. The student is fine-tuned against
an aliasing-free dataset generated by passing sinusoids through
the original model and removing non-harmonic components from
the output spectra.
Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the
majority of our case studies, the reduction in aliasing was greater
than that achieved by two times oversampling. One side-effect
of the proposed method is that harmonic distortion components
are also affected.
This adverse effect was found to be modeldependent, with the LSTM models giving the best balance between
anti-aliasing and preserving the perceived similarity to an analog
reference device.