Download A Hierarchical Deep Learning Approach for Minority Instrument Detection
Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.
Download Wave Digital Model of the MXR Phase 90 Based on a Time-Varying Resistor Approximation of JFET Elements
Virtual Analog (VA) modeling is the practice of digitally emulating analog audio gear. Over the past few years, with the purpose of recreating the alleged distinctive sound of audio equipment and musicians, many different guitar pedals have been emulated by means of the VA paradigm but little attention has been given to phasers. Phasers process the spectrum of the input signal with time-varying notches by means of shifting stages typically realized with a network of transistors, whose nonlinear equations are, in general, demanding to be solved. In this paper, we take as a reference the famous MXR Phase 90 guitar pedal, and we propose an efficient time-varying model of its Junction Field-Effect Transistors (JFETs) based on a channel resistance approximation. We then employ such a model in the Wave Digital domain to emulate in real-time the guitar pedal, obtaining an implementation characterized by low computational cost and good accuracy.
Download Hyper Recurrent Neural Network: Condition Mechanisms for Black-Box Audio Effect Modeling
Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.
Download Revisiting the Second-Order Accurate Non-Iterative Discretization Scheme
In the field of virtual analog modeling, a variety of methods have been proposed to systematically derive simulation models from circuit schematics. However, they typically rely on implicit numerical methods to transform the differential equations governing the circuit to difference equations suitable for simulation. For circuits with non-linear elements, this usually means that a non-linear equation has to be solved at run-time at high computational cost. As an alternative to fully-implicit numerical methods, a family of non-iterative discretization schemes has recently been proposed, allowing a significant reduction of the computational load. However, in the original presentation, several assumptions are made regarding the structure of the ODE, limiting the generality of these schemes. Here, we show that for the second-order accurate variant in particular, the method is applicable to general ODEs. Furthermore, we point out an interesting connection to the implicit midpoint method.
Download Digitizing the Schumann PLL Analog Harmonizer
The Schumann Electronics PLL is a guitar effect that uses hardwarebased processing of one-bit digital signals, with op-amp saturation and CMOS control systems used to generate multiple square waves derived from the frequency of the input signal. The effect may be simulated in the digital domain by cascading stages of statespace virtual analog modeling and algorithmic approximations of CMOS integrated circuits. Phase-locked loops, decade counters, and Schmitt trigger inverters are modeled using logic algorithms, allowing for the comparable digital implementation of the Schumann PLL. Simulation results are presented.
Download Wave Digital Modeling of Circuits with Multiple One-Port Nonlinearities Based on Lipschitz-Bounded Neural Networks
Neural networks have found application within the Wave Digital Filters (WDFs) framework as data-driven input-output blocks for modeling single one-port or multi-port nonlinear devices in circuit systems. However, traditional neural networks lack predictable bounds for their output derivatives, essential to ensure convergence when simulating circuits with multiple nonlinear elements using fixed-point iterative methods, e.g., the Scattering Iterative Method (SIM). In this study, we address such issue by employing Lipschitz-bounded neural networks for regressing nonlinear WD scattering relations of one-port nonlinearities.
Download Graphic Equalizers Based on Limited Action Networks
Several classic graphic equalizers, such as the Altec 9062A and the “Motown EQ,” have stepped gain controls and “proportional bandwidth” and used passive, constant-resistance, RLC circuit designs based on “limited-action networks.” These are related to bridged-T-network EQs, with several differences that cause important practical improvements, also affecting their sound. We study these networks, giving their circuit topologies, design principles, and design equations, which appear not to have been published before. We make a Wave Digital Filter which can model either device or an idealized “Exact” version, to which we can add various new extensions and features.
Download Synthesizer Sound Matching Using Audio Spectrogram Transformers
Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced musicians alike, whilst also affording new means of interaction with synthesizers. Considering the enormous variety of synthesizers in the marketplace, and the complexity of many of them, general-purpose sound matching systems that function with minimal knowledge or prior assumptions about the underlying synthesis architecture are particularly desirable. With this in mind, we introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer. We demonstrate the viability of this model by training on a large synthetic dataset of randomly generated samples from the popular Massive synthesizer. We show that this model can reconstruct parameters of samples generated from a set of 16 parameters, highlighting its improved fidelity relative to multi-layer perceptron and convolutional neural network baselines. We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations, and sounds from other synthesizers and musical instruments.
Download Spectral Analysis of Stochastic Wavetable Synthesis
Dynamic Stochastic Wavetable Synthesis (DSWS) is a sound synthesis and processing technique that uses probabilistic waveform synthesis techniques invented by Iannis Xenakis as a modulation/ distortion effect applied to a wavetable oscillator. The stochastic manipulation of the wavetable provides a means to creating signals with rich, dynamic spectra. In the present work, the DSWS technique is compared to other fundamental sound synthesis techniques such as frequency modulation synthesis. Additionally, several extensions of the DSWS technique are proposed.
Download Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling
Guitar tablature transcription (GTT) aims at automatically generating symbolic representations from real solo guitar performances. Due to its applications in education and musicology, GTT has gained traction in recent years. However, GTT robustness has been limited due to the small size of available datasets. Researchers have recently used synthetic data that simulates guitar performances using pre-recorded or computer-generated tones, allowing for scalable and automatic data generation. The present study complements these efforts by demonstrating that GTT robustness can be improved by including synthetic training data created using recordings of real guitar tones played with different audio effects. We evaluate our approach on a new evaluation dataset with professional solo guitar performances that we composed and collected, featuring a wide array of tones, chords, and scales.