Download Fade-in Control for Feedback Delay Networks In virtual acoustics, it is common to simulate the early part of a
Room Impulse Response using approaches from geometrical acoustics and the late part using Feedback Delay Networks (FDNs). In
order to transition from the early to the late part, it is useful to
slowly fade-in the FDN response. We propose two methods to control the fade-in, one based on double decays and the other based
on modal beating. We use modal analysis to explain the two concepts for incorporating this fade-in behaviour entirely within the
IIR structure of a multiple input multiple output FDN. We present
design equations, which allow for placing the fade-in time at an
arbitrary point within its derived limit.
Download Delay Network Architectures for Room and Coupled Space Modeling Feedback delay network reverberators have decay filters associated with each delay line to model the frequency dependent reverberation time (T60) of a space. The decay filters are typically
designed such that all delay lines independently produce the same
T60 frequency response. However, in real rooms, there are multiple, concurrent T60 responses that depend on the geometry and
physical properties of the materials present in the rooms. In this
paper, we propose the Grouped Feedback Delay Network (GFDN),
where groups of delay lines share different target T60s. We use the
GFDN to simulate coupled rooms, where one room is significantly
larger than the other. We also simulate rooms with different materials, with unique decay filters associated with each delay line
group, designed to represent the T60 characteristics of a particular
material. The T60 filters are designed to emulate the materials’ absorption characteristics with minimal computation. We discuss the
design of the mixing matrix to control inter- and intra-group mixing, and show how the amount of mixing affects behavior of the
room modes. Finally, we discuss the inclusion of air absorption
filters on each delay line and physically motivated room resizing
techniques with the GFDN.
Download Energy-Preserving Time-Varying Schroeder Allpass Filters In artificial reverb algorithms, gains are commonly varied over
time to break up temporal patterns, improving quality. We propose
a family of novel Schroeder-style allpass filters that are energypreserving under arbitrary, continuous changes of their gains over
time. All of them are canonic in delays, and some are also canonic
in multiplies. This yields several structures that are novel even in
the time-invariant case. Special cases for cascading and nesting
these structures with a reduced number of multipliers are shown as
well. The proposed structures should be useful in artificial reverb
applications and other time-varying audio effects based on allpass
filters, especially where allpass filters are embedded in feedback
loops and stability may be an issue.
Download Perceptual Evaluation of Mitigation Approaches of Impairments Due to Spatial Undersampling in Binaural Rendering of Spherical Microphone Array Data: Dry Acoustic Environments Employing a finite number of discrete microphones, instead of a
continuous distribution according to theory, reduces the physical
accuracy of sound field representations captured by a spherical microphone array. For a binaural reproduction of the sound field, a
number of approaches have been proposed in the literature to mitigate the perceptual impairment when the captured sound fields are
reproduced binaurally. We recently presented a perceptual evaluation of a representative set of approaches in conjunction with reverberant acoustic environments. This paper presents a similar study
but with acoustically dry environments with reverberation times
of less than 0.25 s. We examined the Magnitude Least-Squares
algorithm, the Bandwidth Extraction Algorithm for Microphone
Arrays, Spherical Head Filters, spherical harmonics Tapering, and
Spatial Subsampling, all up to a spherical harmonics order of 7.
Although dry environments violate some of the assumptions underlying some of the approaches, we can confirm the results of
our previous study: Most approaches achieve an improvement
whereby the magnitude of the improvement is comparable across
approaches and acoustic environments.
Download Interaural Cues Cartography: Localization Cues Repartition for Three Spatialization Methods The Synthetic Transaural Audio Rendering (STAR) method, first
introduced at DAFx-06 then enhanced at DAFx-19, is a perceptive
approach for sound spatialization aiming at reproducing the acoustic cues at the ears of the listener, using loudspeakers. To validate the method, several comparisons with state-of-the-art spatialization methods (VBAP and HOA) were conducted. Previously,
quality comparisons with human subjects have been made, providing meaningful subjective results in real conditions. In this
article an objective comparison is proposed, using acoustic cues
error maps. The cartography enables us to study the spatialization
effect in a 2D space, for a listening position within an audience,
and thus not necessarily located at the center. Two approaches
are conducted: the first simulates the binaural signals for a virtual KEMAR manikin, in ideal conditions and with a fine resolution; the second records these binaural signals using a real KEMAR manikin, providing real data with reverberation, though with
a coarser resolution. In both cases the acoustic cues were derived
from the binaural signals (either simulated or measured), and compared to the reference value taken at the center of the octophonic
loudspeakers configuration. The obtained error maps display comforting results, our STAR method producing the smallest error for
both simulated and experimental conditions.
Download Neural Parametric Equalizer Matching Using Differentiable Biquads This paper proposes a neural network for carrying out parametric equalizer (EQ) matching. The novelty of this neural network
solution is that it can be optimized directly in the frequency domain by means of differentiable biquads, rather than relying solely
on a loss on parameter values which does not correlate directly
with the system output. We compare the performance of the proposed neural network approach with that of a baseline algorithm
based on a convex relaxation of the problem. It is observed that the
neural network can provide better matching than the baseline approach because it directly attempts to solve the non-convex problem. Moreover, we show that the same network trained with only
a parameter loss is insufficient for the task, despite the fact that it
matches underlying EQ parameters better than one trained with a
combination of spectral and parameter losses.
Download Relative Music Loudness Estimation Using Temporal Convolutional Networks and a CNN Feature Extraction Front-End Relative music loudness estimation is a MIR task that consists in
dividing audio in segments of three classes: Foreground Music,
Background Music and No Music. Given the temporal correlation
of music, in this work we approach the task using a type of network
with the ability to model temporal context: the Temporal Convolutional Network (TCN). We propose two architectures: a TCN,
and a novel architecture resulting from the combination of a TCN
with a Convolutional Neural Network (CNN) front-end. We name
this new architecture CNN-TCN. We expect the CNN front-end to
work as a feature extraction strategy to achieve a more efficient usage of the network’s parameters. We use the OpenBMAT dataset
to train and test 40 TCN and 80 CNN-TCN models with two grid
searches over a set of hyper-parameters. We compare our models with the two best algorithms submitted to the tasks of music
detection and relative music loudness estimation in MIREX 2019.
All our models outperform the MIREX algorithms even when using a lower number of parameters. The CNN-TCN emerges as the
best architecture as all its models outperform all TCN models. We
show that adding a CNN front-end to a TCN can actually reduce
the number of parameters of the network while improving performance. The CNN front-end effectively works as a feature extractor producing consistent patterns that identify different combinations of music and non-music sounds and also helps in producing
a smoother output in comparison to the TCN models.
Download Neural Modelling of Time-Varying Effects This paper proposes a grey-box neural network based approach
to modelling LFO modulated time-varying effects.
The neural
network model receives both the unprocessed audio, as well as
the LFO signal, as input. This allows complete control over the
model’s LFO frequency and shape. The neural networks are trained
using guitar audio, which has to be processed by the target effect
and also annotated with the predicted LFO signal before training.
A measurement signal based on regularly spaced chirps was used
to accurately predict the LFO signal. The model architecture has
been previously shown to be capable of running in real-time on a
modern desktop computer, whilst using relatively little processing
power. We validate our approach creating models of both a phaser
and a flanger effects pedal, and theoretically it can be applied to
any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal,
and previous work has shown that this corresponds to the model
being nearly indistinguishable from the target device.
Download Onset-Informed Source Separation Using Non-Negative Matrix Factorization With Binary Masks This paper describes a new onset-informed source separation
method based on non-negative matrix factorization (NMF) with binary masks. Many previous approaches to separate a target instrument sound from polyphonic music have used side-information of
the target that is time-consuming to prepare. The proposed method
leverages the onsets of the target instrument sound to facilitate separation. Onsets are useful information that users can easily generate by tapping while listening to the target in music. To utilize
onsets in NMF-based sound source separation, we introduce binary masks that represent on/off states of the target sound. Binary
masks are formulated as Markov chains based on continuity of musical instrument sound. Owing to the binary masks, onsets can be
handled as a time frame in which the binary masks change from
off to on state. The proposed model is inferred by Gibbs sampling, in which the target sound source can be sampled efficiently
by using its onsets. We conducted experiments to separate the target melody instrument from recorded polyphonic music. Separation results showed about 2 to 10 dB improvement in target source
to residual noise ratio compared to the polyphonic sound. When
some onsets were missed or deviated, the method is still effective
for target sound source separation.
Download Differentiable IIR Filters for Machine Learning Applications In this paper we present an approach to using traditional digital IIR
filter structures inside deep-learning networks trained using backpropagation. We establish the link between such structures and
recurrent neural networks. Three different differentiable IIR filter
topologies are presented and compared against each other and an
established baseline. Additionally, a simple Wiener-Hammerstein
model using differentiable IIRs as its filtering component is presented and trained on a guitar signal played through a Boss DS-1
guitar pedal.