Download Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation Neural networks have been applied within the Wave Digital Filter
(WDF) framework as data-driven models for nonlinear multi-port
circuit elements. Conventionally, these models are trained on wave
variables obtained by sampling the current-voltage characteristic
of the considered nonlinear element before being incorporated into
the circuit WDF implementation. However, isolating multi-port
elements for this process can be challenging, as their nonlinear
behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a
novel approach for training neural models of nonlinear multi-port
elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage
measurements. Exploiting the differentiability of WD simulations,
we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing
a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach
in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced
feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference
and anchor (MUSHRA) test and a two-alternative-forced-choice
(2AFC) discrimination task have been conducted to compare the
proposed method against ground truth recordings and conventional
RT-based approaches. The results show that the proposed system
delivers robust performance in various scenarios, achieving highly
plausible reverberation synthesis.
Download Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis Deep State Space Models (SSMs) have shown remarkable performance in long-sequence reasoning tasks, such as raw audio
classification, and audio generation. This paper introduces PianoSSM, an end-to-end deep SSM neural network architecture designed to synthesize raw piano audio directly from MIDI input.
The network requires no intermediate representations or domainspecific expert knowledge, simplifying training and improving accessibility.
Quantitative evaluations on the MAESTRO dataset
show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL)
of 7.02 at 16kHz, outperforming DDSP-Piano v1 with a MSSL of
7.09. At 24kHz, Piano-SSM maintains competitive performance
with an MSSL of 6.75, closely matching DDSP-Piano v2’s result of 6.58. Evaluations on the MAPS dataset achieve an MSSL
score of 8.23, which demonstrates the generalization capability
even when training with very limited data. Further analysis highlights Piano-SSM’s ability to train on high sampling-rate audio
while synthesizing audio at lower sampling rates, explicitly linking performance loss to aliasing effects. Additionally, the proposed model facilitates real-time causal inference through a custom C++17 header-only implementation. Using an Intel Core i712700 processor at 4.5GHz, with single core inference, allows synthesizing one second of audio at 44.1kHz in 0.44s with a workload of 23.1GFLOPS/s and an 10.1µs input/output delay with the
largest network. While the smallest network at 16kHz only needs
0.04s with 2.3GFLOP/s and 2.6µs input/output delay. These results underscore Piano-SSM’s practical utility and efficiency in
real-time audio synthesis applications.
Download Neural Sample-Based Piano Synthesis Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and
relative accuracy despite presenting significant memory storage
requirements. This paper proposes a novel hybrid approach to
sample-based piano synthesis aimed at improving the fidelity of
sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound
recorded from a single example of piano key at a given velocity.
The network is trained to learn the nonlinear relationship between
the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves
high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.
Download Differentiable Scattering Delay Networks for Artificial Reverberation Scattering delay networks (SDNs) provide a flexible and efficient
framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling
gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating
key parameters such as scattering matrices and absorption filters
as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic
features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN
configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends For their low and constant computational cost, non-iterative methods for the solution of differential problems are gaining popularity
in virtual analog provided their stability properties and accuracy
level afford their use at no exaggerate temporal oversampling. At
least in some application case studies, one recent family of noniterative schemes has shown promise to outperform methods that
achieve accurate results at the cost of iterating several times while
converging to the numerical solution. Here, this family is contextualized and studied against known classes of non-iterative methods.
The results from these studies foster a more general discussion
about the possibilities, role and prospective use of non-iterative
methods in virtual analog.
Download A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments From the exploration of databases of instrument sounds to the selfassisted practice of musical instruments, methods for automatically
and objectively assessing the quality of musical tones are in high
demand. In this paper, we develop a new algorithm for estimating
the duration of the attack, with particular attention to wind and
bowed string instruments. In fact, for these instruments, the quality
of the tones is highly influenced by the attack clarity, for which,
together with pitch stability, the attack duration is an indicator often
used by teachers by ear. Since the direct estimation of the attack
duration from sounds is made difficult by the initial preponderance of the excitation noise, we propose a more robust approach
based on the separation of the ensemble of the harmonics from the
excitation noise, which is obtained by means of an improved pitchsynchronous wavelet transform. We also define a new parameter,
the noise ducking time, which is relevant for detecting the extent of
the noise component in the attack. In addition to the exploration of
available sound databases, for testing our algorithm, we created an
annotated data set in which several problematic sounds are included.
Moreover, to check the consistency and robustness of our duration
estimates, we applied our algorithm to sets of synthetic sounds with
noisy attacks of programmable duration.
Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This
paper demonstrates how existing GUI elements can be translated
into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system
is designed to spatialize the auditory output from VoiceOver, the
built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability.
A between-groups experiment
was conducted to compare standard VoiceOver with the proposed
spatialized version. Non visually-impaired participants (n = 32),
with no visual access to the test interface, completed a list-based
exploration and then attempted to reconstruct the UI solely from
auditory cues. Experimental results indicate that the head-tracked
group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant
differences in self-reported workload or usability. These findings
suggest that potential benefits may come from the integration of
head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users
are needed.
Although the experimental testbed uses a generic
desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio
approach could benefit visually impaired producers and musicians
navigating plug-in controls.
Download Real-Time Virtual Analog Modelling of Diode-Based VCAs Some early analog voltage-controlled amplifiers (VCAs) utilized
semiconductor diodes as a variable-gain element. Diode-based
VCAs exhibit a unique sound quality, with distortion dependent
both on signal level and gain control. In this work, we examine the
behavior of a simplified circuit for a diode-based VCA and propose
a nonlinear, explicit, stateless digital model. This approach avoids
traditional iterative algorithms, which can be computationally intensive. The resulting digital model retains the sonic characteristics
of the analog model and is suitable for real-time simulation. We
present an analysis of the gain characteristics and harmonic distortion produced by this model, as well as practical guidance for
implementation. We apply this approach to a set of alternative
analog topologies and introduce a family of digital VCA models
based on fixed nonlinearities with variable operating points.
Download Stable Limit Cycles as Tunable Signal Sources This paper presents a method for synthesizing audio signals from
nonlinear dynamical systems exhibiting stable limit cycles, with
control over frequency and amplitude independent of changes to
the system’s internal parameters. Using the van der Pol oscillator
and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the
angular frequency and normalizing amplitude extrema. Practical
implementation considerations are discussed, as are the limits and
challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation
of transients in FM synthesis by means of a van der Pol oscillator
and a Supersaw oscillator bank based on the Brusselator.