Download Stable Limit Cycles as Tunable Signal Sources This paper presents a method for synthesizing audio signals from
nonlinear dynamical systems exhibiting stable limit cycles, with
control over frequency and amplitude independent of changes to
the system’s internal parameters. Using the van der Pol oscillator
and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the
angular frequency and normalizing amplitude extrema. Practical
implementation considerations are discussed, as are the limits and
challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation
of transients in FM synthesis by means of a van der Pol oscillator
and a Supersaw oscillator bank based on the Brusselator.
Download Lookup Table Based Audio Spectral Transformation We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed
approach, the audio spectrum is visualized as a two-dimensional
color map of frequency versus amplitude, serving as an editable
lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and
spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has
the potential to streamline audio-editing workflows and encourage
creative experimentation. The approach also supports real-time
processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers
an accessible yet powerful framework for designing and applying
a broad range of spectral audio effects through intuitive visual manipulation.
Download A Non-Uniform Subband Implementation of an Active Noise Control System for Snoring Reduction The snoring noise can be extremely annoying and can negatively
affect people’s social lives. To reduce this problem, active noise
control (ANC) systems can be adopted for snoring cancellation.
Recently, adaptive subband systems have been developed to improve the convergence rate and reduce the computational complexity of the ANC algorithm. Several structures have been proposed
with different approaches. This paper proposes a non-uniform subband adaptive filtering (SAF) structure to improve a feedforward
active noise control algorithm. The non-uniform band distribution
allows for a higher frequency resolution of the lower frequencies,
where the snoring noise is most concentrated. Several experiments
have been carried out to evaluate the proposed system in comparison with a reference ANC system which uses a uniform approach.
Download Compositional Application of a Chaotic Dynamical System for the Synthesis of Sounds The paper presents a review of compositional application developed in the last years using a chaotic dynamical system in different
sound synthesis processes. The use of chaotic dynamical systems
in computer music has been a widespread practice for some time
now. The experimentation presented in this work shows the use
of a specific chaotic system: the Chua’s oscillator, within different
sound synthesis methods. A family of new musical instruments
has been developed exploiting the potential offered by the use of
this chaotic system to produce complex timbres and sounds. The
instruments have been used for the creation of musical pieces and
for the realization of live electronics performances.
Download DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions This study introduces a novel and interpretable model, DiffVox,
for matching vocal effects in music production. DiffVox, short
for “Differentiable Vocal Fx", integrates parametric equalisation,
dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for
parameter estimation. Vocal presets are retrieved from two datasets,
comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong
relationships between effects and parameters, such as the highpass and low-shelf filters often working together to shape the low
end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to
McAdams’ timbre dimensions, where the most crucial component
modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms
the non-Gaussian nature of the parameter distribution, highlighting
the complexity of the vocal effects space. These initial findings on
the parameter distributions set the foundation for future research
in vocal effects modelling and automatic mixing.
Download Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss This paper addresses the task of lyrics-to-audio alignment, which
involves synchronizing textual lyrics with corresponding music
audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge
for training lyrics-to-audio models due to the lack of frame-wise
phoneme labels. However, we find that phoneme labels can be
partially derived from word-level annotations: for single-phoneme
words, all frames corresponding to the word can be labeled with
the same phoneme; for multi-phoneme words, phoneme labels can
be assigned at the first and last frames of the word. To leverage
this partial information, we construct a mask for those frames and
propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model,
we adopt an autoencoder trained with a Connectionist Temporal
Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model
provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds
on the testing Jamendo dataset.
Download Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Recent studies on classifying electric guitar effects have achieved
high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting
mainly of single notes rather than realistic guitar recordings.
Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for
real-time or resource-constrained applications. In this work, we
recorded realistic guitar performances using four different guitars
and created three datasets by applying a chain of five effects with
increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and
parameters. We also propose a novel Neural Architecture Search
method aimed at discovering accurate yet compact convolutional
neural network models to reduce power and memory consumption.
We compared its performance to a basic random search strategy,
showing that our custom Neural Architecture Search outperformed
random search in identifying models that balance accuracy and
complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity
grows, while dense layers have less impact. Additionally, among
the effects, tremolo was identified as the most challenging to classify.
Download Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects Structured pruning is a technique for reducing the computational
load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule
or ranking criterion.
This paper investigates the application of
structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures.
We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To
quantify the trade-off between parameter count and audio fidelity,
we construct a theoretical model of the approximation error as a
function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.
Download Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Accurately estimating nonlinear audio effects without access to
paired input-output signals remains a challenging problem. This
work studies unsupervised probabilistic approaches for solving this
task. We introduce a method, novel for this application, based
on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a
previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the
effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show
that the diffusion-based approach provides more stable results and
is less sensitive to data availability, while the adversarial approach
is superior at estimating more pronounced distortion effects. Our
findings contribute to the robust unsupervised blind estimation of
audio effects, demonstrating the potential of diffusion models for
system identification in music technology.
Download Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in
digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters – sequence number, batch size, and sequence length – and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with
and without conditioning by user controls. Results demonstrate
that carefully tuning these parameters enhances model accuracy
and training stability, while also reducing computational demands.
Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the
revised TBPTT configuration maintains high perceptual quality.