Download How Smooth Do You Think I Am: An Analysis on the Frequency-Dependent Temporal Roughness of Velvet Noise
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combinative qualities such as pulse density, filter type, and filter shape contribute to frequency-dependent temporal roughness. An adaptive perceptual test was conducted to find minimal densities of smooth noise at octave bands as well as corresponding lowpass bands. The results showed that the cutoff frequency of a lowpass filter as well as the center frequency of an octave filter is correlated with the perceived minimal density of smooth noise. When the lowpass filter with the lowest cutoff frequency, 125 Hz, was applied, the filtered velvet noise sounded smooth at an average of 725 pulses/s and an average of 401 pulses/s for octave filtered noise at a center frequency of 125 Hz. For the broadband velvet noise, the minimal density of smoothness was found to be at an average of 1554 pulses/s. The results of this paper are applicable in designing velvet-noise-based artificial reverberation with minimal pulse density.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing
In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download A Diffusion-Based Generative Equalizer for Music Restoration
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter
In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.
Download RIR2FDN: An Improved Room Impulse Response Analysis and Synthesis
This paper seeks to improve the state-of-the-art in delay-networkbased analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
Download Binaural Dark-Velvet-Noise Reverberator
Binaural late-reverberation modeling necessitates the synthesis of frequency-dependent inter-aural coherence, a crucial aspect of spatial auditory perception. Prior studies have explored methodologies such as filtering and cross-mixing two incoherent late reverberation impulse responses to emulate the coherence observed in measured binaural late reverberation. In this study, we introduce two variants of the binaural dark-velvet-noise reverberator. The first one uses cross-mixing of two incoherent dark-velvet-noise sequences that can be generated efficiently. The second variant is a novel time-domain jitter-based approach. The methods’ accuracies are assessed through objective and subjective evaluations, revealing that both methods yield comparable performance and clear improvements over using incoherent sequences. Moreover, the advantages of the jitter-based approach over cross-mixing are highlighted by introducing a parametric width control, based on the jitter-distribution width, into the binaural dark velvet noise reverberator. The jitter-based approach can also introduce timedependent coherence modifications without additional computational cost.
Download Differentiable Active Acoustics - Optimizing Stability via Gradient Descent
Active acoustics (AA) refers to an electroacoustic system that actively modifies the acoustics of a room. For common use cases, the number of transducers—loudspeakers and microphones—involved in the system is large, resulting in a large number of system parameters. To optimally blend the response of the system into the natural acoustics of the room, the parameters require careful tuning, which is a time-consuming process performed by an expert. In this paper, we present a differentiable AA framework, which allows multi-objective optimization without impairing architecture flexibility. The system is implemented in PyTorch to be easily translated into a machine-learning pipeline, thus automating the tuning process. The objective of the pipeline is to optimize the digital signal processor (DSP) component to evenly distribute the energy in the feedback loop across frequencies. We investigate the effectiveness of DSPs composed of finite impulse response filters, which are unconstrained during the optimization. We study the effect of multiple filter orders, number of transducers, and loss functions on the performance. Different loss functions behave similarly for systems with few transducers and low-order filters. Increasing the number of transducers and the order of the filters improves results and accentuates the difference in the performance of the loss functions.
Download Real-Time Implementation of a Linear-Phase Octave Graphic Equalizer
This paper proposes a real-time implementation of a linear-phase octave graphic equalizer (GEQ), previously introduced by the same authors. The structure of the GEQ is based on interpolated finite impulse response (IFIR) filters and is derived from a single prototype FIR filter. The low computational cost and small latency make the presented GEQ suitable for real-time applications. In this work, the GEQ has been implemented as a plugin of a specific software, used for real-time tests. The performance of the equalizer has been evaluated through subjective tests, comparing it with a filterbank equalizer. For the tests, four standard equalization curves have been chosen. The experimental results show promising outcomes. The result is an accurate real-time-capable linear-phase GEQ with a reasonable latency.
Download Perceptual Decorrelator Based on Resonators
Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.
Download Zero-Phase Sound via Giant FFT
Given the speedy computation of the FFT in current computer hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio signal can be obtained by zeroing the phase angle of its complex spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to enhance the resulting sound quality. As a result, a sound with the same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are retained, however. Zero-phase sounds are palindromic in the sense that they are symmetric in time. A comparison of the zero-phase conversion to the autocorrelation function helps to understand its properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real and imaginary parts of the spectrum to produce a stereo effect. A frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.