Download Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation
Neural networks have been applied within the Wave Digital Filter (WDF) framework as data-driven models for nonlinear multi-port circuit elements. Conventionally, these models are trained on wave variables obtained by sampling the current-voltage characteristic of the considered nonlinear element before being incorporated into the circuit WDF implementation. However, isolating multi-port elements for this process can be challenging, as their nonlinear behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a novel approach for training neural models of nonlinear multi-port elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage measurements. Exploiting the differentiability of WD simulations, we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method
Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.
Download Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis
Deep State Space Models (SSMs) have shown remarkable performance in long-sequence reasoning tasks, such as raw audio classification, and audio generation. This paper introduces PianoSSM, an end-to-end deep SSM neural network architecture designed to synthesize raw piano audio directly from MIDI input. The network requires no intermediate representations or domainspecific expert knowledge, simplifying training and improving accessibility. Quantitative evaluations on the MAESTRO dataset show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL) of 7.02 at 16kHz, outperforming DDSP-Piano v1 with a MSSL of 7.09. At 24kHz, Piano-SSM maintains competitive performance with an MSSL of 6.75, closely matching DDSP-Piano v2’s result of 6.58. Evaluations on the MAPS dataset achieve an MSSL score of 8.23, which demonstrates the generalization capability even when training with very limited data. Further analysis highlights Piano-SSM’s ability to train on high sampling-rate audio while synthesizing audio at lower sampling rates, explicitly linking performance loss to aliasing effects. Additionally, the proposed model facilitates real-time causal inference through a custom C++17 header-only implementation. Using an Intel Core i712700 processor at 4.5GHz, with single core inference, allows synthesizing one second of audio at 44.1kHz in 0.44s with a workload of 23.1GFLOPS/s and an 10.1µs input/output delay with the largest network. While the smallest network at 16kHz only needs 0.04s with 2.3GFLOP/s and 2.6µs input/output delay. These results underscore Piano-SSM’s practical utility and efficiency in real-time audio synthesis applications.
Download Neural Sample-Based Piano Synthesis
Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and relative accuracy despite presenting significant memory storage requirements. This paper proposes a novel hybrid approach to sample-based piano synthesis aimed at improving the fidelity of sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound recorded from a single example of piano key at a given velocity. The network is trained to learn the nonlinear relationship between the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.
Download Differentiable Scattering Delay Networks for Artificial Reverberation
Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends
For their low and constant computational cost, non-iterative methods for the solution of differential problems are gaining popularity in virtual analog provided their stability properties and accuracy level afford their use at no exaggerate temporal oversampling. At least in some application case studies, one recent family of noniterative schemes has shown promise to outperform methods that achieve accurate results at the cost of iterating several times while converging to the numerical solution. Here, this family is contextualized and studied against known classes of non-iterative methods. The results from these studies foster a more general discussion about the possibilities, role and prospective use of non-iterative methods in virtual analog.
Download A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments
From the exploration of databases of instrument sounds to the selfassisted practice of musical instruments, methods for automatically and objectively assessing the quality of musical tones are in high demand. In this paper, we develop a new algorithm for estimating the duration of the attack, with particular attention to wind and bowed string instruments. In fact, for these instruments, the quality of the tones is highly influenced by the attack clarity, for which, together with pitch stability, the attack duration is an indicator often used by teachers by ear. Since the direct estimation of the attack duration from sounds is made difficult by the initial preponderance of the excitation noise, we propose a more robust approach based on the separation of the ensemble of the harmonics from the excitation noise, which is obtained by means of an improved pitchsynchronous wavelet transform. We also define a new parameter, the noise ducking time, which is relevant for detecting the extent of the noise component in the attack. In addition to the exploration of available sound databases, for testing our algorithm, we created an annotated data set in which several problematic sounds are included. Moreover, to check the consistency and robustness of our duration estimates, we applied our algorithm to sets of synthetic sounds with noisy attacks of programmable duration.
Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility
Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This paper demonstrates how existing GUI elements can be translated into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system is designed to spatialize the auditory output from VoiceOver, the built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability. A between-groups experiment was conducted to compare standard VoiceOver with the proposed spatialized version. Non visually-impaired participants (n = 32), with no visual access to the test interface, completed a list-based exploration and then attempted to reconstruct the UI solely from auditory cues. Experimental results indicate that the head-tracked group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant differences in self-reported workload or usability. These findings suggest that potential benefits may come from the integration of head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users are needed. Although the experimental testbed uses a generic desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio approach could benefit visually impaired producers and musicians navigating plug-in controls.
Download Real-Time Virtual Analog Modelling of Diode-Based VCAs
Some early analog voltage-controlled amplifiers (VCAs) utilized semiconductor diodes as a variable-gain element. Diode-based VCAs exhibit a unique sound quality, with distortion dependent both on signal level and gain control. In this work, we examine the behavior of a simplified circuit for a diode-based VCA and propose a nonlinear, explicit, stateless digital model. This approach avoids traditional iterative algorithms, which can be computationally intensive. The resulting digital model retains the sonic characteristics of the analog model and is suitable for real-time simulation. We present an analysis of the gain characteristics and harmonic distortion produced by this model, as well as practical guidance for implementation. We apply this approach to a set of alternative analog topologies and introduce a family of digital VCA models based on fixed nonlinearities with variable operating points.
Download Stable Limit Cycles as Tunable Signal Sources
This paper presents a method for synthesizing audio signals from nonlinear dynamical systems exhibiting stable limit cycles, with control over frequency and amplitude independent of changes to the system’s internal parameters. Using the van der Pol oscillator and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the angular frequency and normalizing amplitude extrema. Practical implementation considerations are discussed, as are the limits and challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation of transients in FM synthesis by means of a van der Pol oscillator and a Supersaw oscillator bank based on the Brusselator.