Download FDNTB: The Feedback Delay Network Toolbox Feedback delay networks (FDNs) are recursive filters, which are
widely used for artificial reverberation and decorrelation. While
there exists a vast literature on a wide variety of reverb topologies,
this work aims to provide a unifying framework to design and analyze delay-based reverberators. To this end, we present the Feedback Delay Network Toolbox (FDNTB), a collection of the MATLAB functions and example scripts. The FDNTB includes various representations of FDNs and corresponding translation functions. Further, it provides a selection of special feedback matrices,
topologies, and attenuation filters. In particular, more advanced
algorithms such as modal decomposition, time-varying matrices,
and filter feedback matrices are readily accessible. Furthermore,
our toolbox contains several additional FDN designs. Providing
MATLAB code under a GNU-GPL 3.0 license and including illustrative examples, we aim to foster research and education in the
field of audio processing.
Download Velvet-Noise Feedback Delay Network Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds
recorded in a dry studio environment. Delay-based methods are
a family of artificial reverberators using recirculating delay lines
to create this effect.
The feedback delay network is a popular
delay-based reverberator providing a comprehensive framework
for parametric reverberation by formalizing the recirculation of
a set of interconnected delay lines. However, one known limitation of this algorithm is the initial slow build-up of echoes, which
can sound unrealistic, and overcoming this problem often requires
adding more delay lines to the network. In this paper, we study the
effect of adding velvet-noise filters, which have random sparse coefficients, at the input and output branches of the reverberator. The
goal is to increase the echo density while minimizing the spectral coloration. We compare different variations of velvet-noise
filtering and show their benefits. We demonstrate that with velvet
noise, the echo density of a conventional feedback delay network
can be exceeded using half the number of delay lines and saving
over 50% of computing operations in a practical configuration using low-order attenuation filters.
Download Fade-in Control for Feedback Delay Networks In virtual acoustics, it is common to simulate the early part of a
Room Impulse Response using approaches from geometrical acoustics and the late part using Feedback Delay Networks (FDNs). In
order to transition from the early to the late part, it is useful to
slowly fade-in the FDN response. We propose two methods to control the fade-in, one based on double decays and the other based
on modal beating. We use modal analysis to explain the two concepts for incorporating this fade-in behaviour entirely within the
IIR structure of a multiple input multiple output FDN. We present
design equations, which allow for placing the fade-in time at an
arbitrary point within its derived limit.
Download Delay Network Architectures for Room and Coupled Space Modeling Feedback delay network reverberators have decay filters associated with each delay line to model the frequency dependent reverberation time (T60) of a space. The decay filters are typically
designed such that all delay lines independently produce the same
T60 frequency response. However, in real rooms, there are multiple, concurrent T60 responses that depend on the geometry and
physical properties of the materials present in the rooms. In this
paper, we propose the Grouped Feedback Delay Network (GFDN),
where groups of delay lines share different target T60s. We use the
GFDN to simulate coupled rooms, where one room is significantly
larger than the other. We also simulate rooms with different materials, with unique decay filters associated with each delay line
group, designed to represent the T60 characteristics of a particular
material. The T60 filters are designed to emulate the materials’ absorption characteristics with minimal computation. We discuss the
design of the mixing matrix to control inter- and intra-group mixing, and show how the amount of mixing affects behavior of the
room modes. Finally, we discuss the inclusion of air absorption
filters on each delay line and physically motivated room resizing
techniques with the GFDN.
Download Energy-Preserving Time-Varying Schroeder Allpass Filters In artificial reverb algorithms, gains are commonly varied over
time to break up temporal patterns, improving quality. We propose
a family of novel Schroeder-style allpass filters that are energypreserving under arbitrary, continuous changes of their gains over
time. All of them are canonic in delays, and some are also canonic
in multiplies. This yields several structures that are novel even in
the time-invariant case. Special cases for cascading and nesting
these structures with a reduced number of multipliers are shown as
well. The proposed structures should be useful in artificial reverb
applications and other time-varying audio effects based on allpass
filters, especially where allpass filters are embedded in feedback
loops and stability may be an issue.
Download Perceptual Evaluation of Mitigation Approaches of Impairments Due to Spatial Undersampling in Binaural Rendering of Spherical Microphone Array Data: Dry Acoustic Environments Employing a finite number of discrete microphones, instead of a
continuous distribution according to theory, reduces the physical
accuracy of sound field representations captured by a spherical microphone array. For a binaural reproduction of the sound field, a
number of approaches have been proposed in the literature to mitigate the perceptual impairment when the captured sound fields are
reproduced binaurally. We recently presented a perceptual evaluation of a representative set of approaches in conjunction with reverberant acoustic environments. This paper presents a similar study
but with acoustically dry environments with reverberation times
of less than 0.25 s. We examined the Magnitude Least-Squares
algorithm, the Bandwidth Extraction Algorithm for Microphone
Arrays, Spherical Head Filters, spherical harmonics Tapering, and
Spatial Subsampling, all up to a spherical harmonics order of 7.
Although dry environments violate some of the assumptions underlying some of the approaches, we can confirm the results of
our previous study: Most approaches achieve an improvement
whereby the magnitude of the improvement is comparable across
approaches and acoustic environments.
Download Interaural Cues Cartography: Localization Cues Repartition for Three Spatialization Methods The Synthetic Transaural Audio Rendering (STAR) method, first
introduced at DAFx-06 then enhanced at DAFx-19, is a perceptive
approach for sound spatialization aiming at reproducing the acoustic cues at the ears of the listener, using loudspeakers. To validate the method, several comparisons with state-of-the-art spatialization methods (VBAP and HOA) were conducted. Previously,
quality comparisons with human subjects have been made, providing meaningful subjective results in real conditions. In this
article an objective comparison is proposed, using acoustic cues
error maps. The cartography enables us to study the spatialization
effect in a 2D space, for a listening position within an audience,
and thus not necessarily located at the center. Two approaches
are conducted: the first simulates the binaural signals for a virtual KEMAR manikin, in ideal conditions and with a fine resolution; the second records these binaural signals using a real KEMAR manikin, providing real data with reverberation, though with
a coarser resolution. In both cases the acoustic cues were derived
from the binaural signals (either simulated or measured), and compared to the reference value taken at the center of the octophonic
loudspeakers configuration. The obtained error maps display comforting results, our STAR method producing the smallest error for
both simulated and experimental conditions.
Download Neural Parametric Equalizer Matching Using Differentiable Biquads This paper proposes a neural network for carrying out parametric equalizer (EQ) matching. The novelty of this neural network
solution is that it can be optimized directly in the frequency domain by means of differentiable biquads, rather than relying solely
on a loss on parameter values which does not correlate directly
with the system output. We compare the performance of the proposed neural network approach with that of a baseline algorithm
based on a convex relaxation of the problem. It is observed that the
neural network can provide better matching than the baseline approach because it directly attempts to solve the non-convex problem. Moreover, we show that the same network trained with only
a parameter loss is insufficient for the task, despite the fact that it
matches underlying EQ parameters better than one trained with a
combination of spectral and parameter losses.
Download Relative Music Loudness Estimation Using Temporal Convolutional Networks and a CNN Feature Extraction Front-End Relative music loudness estimation is a MIR task that consists in
dividing audio in segments of three classes: Foreground Music,
Background Music and No Music. Given the temporal correlation
of music, in this work we approach the task using a type of network
with the ability to model temporal context: the Temporal Convolutional Network (TCN). We propose two architectures: a TCN,
and a novel architecture resulting from the combination of a TCN
with a Convolutional Neural Network (CNN) front-end. We name
this new architecture CNN-TCN. We expect the CNN front-end to
work as a feature extraction strategy to achieve a more efficient usage of the network’s parameters. We use the OpenBMAT dataset
to train and test 40 TCN and 80 CNN-TCN models with two grid
searches over a set of hyper-parameters. We compare our models with the two best algorithms submitted to the tasks of music
detection and relative music loudness estimation in MIREX 2019.
All our models outperform the MIREX algorithms even when using a lower number of parameters. The CNN-TCN emerges as the
best architecture as all its models outperform all TCN models. We
show that adding a CNN front-end to a TCN can actually reduce
the number of parameters of the network while improving performance. The CNN front-end effectively works as a feature extractor producing consistent patterns that identify different combinations of music and non-music sounds and also helps in producing
a smoother output in comparison to the TCN models.
Download Neural Modelling of Time-Varying Effects This paper proposes a grey-box neural network based approach
to modelling LFO modulated time-varying effects.
The neural
network model receives both the unprocessed audio, as well as
the LFO signal, as input. This allows complete control over the
model’s LFO frequency and shape. The neural networks are trained
using guitar audio, which has to be processed by the target effect
and also annotated with the predicted LFO signal before training.
A measurement signal based on regularly spaced chirps was used
to accurately predict the LFO signal. The model architecture has
been previously shown to be capable of running in real-time on a
modern desktop computer, whilst using relatively little processing
power. We validate our approach creating models of both a phaser
and a flanger effects pedal, and theoretically it can be applied to
any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal,
and previous work has shown that this corresponds to the model
being nearly indistinguishable from the target device.