Download Combining Zeroth and First-Order Analysis With Lagrange Polynomials to Reduce Artefacts in Live Concatenative Granulation This paper presents a technique addressing signal discontinuity and concatenation artefacts in real-time granular processing
with rectangular windowing. By combining zero-crossing synchronicity, first-order derivative analysis, and Lagrange polynomials, we can generate streams of uncorrelated and non-overlapping
sonic fragments with minimal low-order derivatives discontinuities. The resulting open-source algorithm, implemented in the
Faust language, provides a versatile real-time software for dynamical looping, wavetable oscillation, and granulation with reduced artefacts due to rectangular windowing and no artefacts
from overlap-add-to-one techniques commonly deployed in granular processing.
Download Alloy Sounds: Non-Repeating Sound Textures With Probabilistic Cellular Automata Contemporary musicians commonly face the challenge of finding
new, characteristic sounds that can make their compositions more
distinct. They often resort to computers and algorithms, which can
significantly aid in creative processes by generating unexpected
material in controlled probabilistic processes. In particular, algorithms that present emergent behaviors, like genetic algorithms
and cellular automata, have fostered a broad diversity of musical explorations. This article proposes an original technique for
the computer-assisted creation and manipulation of sound textures.
The technique uses Probabilistic Cellular Automata, which are yet
seldom explored in the music domain, to blend two audio tracks
into a third, different one. The proposed blending process works
by dividing the source tracks into frequency bands and then associating each of the automaton’s cell to a frequency band. Only one
source, chosen by the cell’s state, is active within each band. The
resulting track has a non-repeating textural pattern that follows the
changes in the Cellular Automata. This blending process allows
the musician to choose the original material and the blend granularity, significantly changing the resulting blends. We demonstrate
how to use the proposed blending process in sound design and its
application in experimental and popular music.
Download Graph-Based Audio Looping and Granulation In this paper we describe similarity graphs computed from timefrequency analysis as a guide for audio playback, with the aim
of extending the content of fixed recordings in creative applications. We explain the creation of the graph from the distance between spectral frames, as well as several features computed from
the graph, such as methods for onset detection, beat detection, and
cluster analysis. Several playback algorithms can be devised based
on conditional pruning of the graph using these methods. We describe examples for looping, granulation, and automatic montage.
Download Topologizing Sound Synthesis via Sheaves In recent years, a range of topological methods have emerged for
processing digital signals. In this paper we show how the construction of topological filters via sheaves can be used to topologize
existing sound synthesis methods. I illustrate this process on two
classes of synthesis approaches: (1) based on linear-time invariant digital filters and (2) based on oscillators defined on a circle.
We use the computationally-friendly approach to modeling topologies via a simplicial complex, and we attach our classical synthesis
methods to them via sheaves. In particular, we explore examples
of simplicial topologies that mimic sampled lines and loops. Over
these spaces we realize concrete examples of simple discrete harmonic oscillators (resonant filters), and simple comb filter based
algorithms (such as Karplus-Strong) as well as frequency modulation.
Download Bio-Inspired Optimization of Parametric Onset Detectors Onset detectors are used to recognize the beginning of musical
events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These
automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection
where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm
should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for
automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm
to replace manual parameter tuning, followed by the computation
of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods
of the Aubio library, using a dataset of monophonic acoustic guitar
recordings. Results show that the proposed solution is effective in
reducing the human effort required in the optimization process: it
replaced more than two days of manual parameter tuning with 13
hours and 34 minutes of automated computation. Moreover, the
resulting performance was comparable to that obtained by manual
optimization.
Download Improving Synthesizer Programming From Variational Autoencoders Latent Space Deep neural networks have been recently applied to the task of
automatic synthesizer programming, i.e., finding optimal values
of sound synthesis parameters in order to reproduce a given input
sound. This paper focuses on generative models, which can infer
parameters as well as generate new sets of parameters or perform
smooth morphing effects between sounds.
We introduce new models to ensure scalability and to increase
performance by using heterogeneous representations of parameters as numerical and categorical random variables.
Moreover,
a spectral variational autoencoder architecture with multi-channel
input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds.
Model performance was evaluated according to several criteria
such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets
dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio
accuracy and show that presented models can be used with subsets
or full sets of synthesizer parameters.
Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.
Download Transition-Aware: A More Robust Approach for Piano Transcription Piano transcription is a classic problem in music information retrieval. More and more transcription methods based on deep learning have been proposed in recent years. In 2019, Google Brain
published a larger piano transcription dataset, MAESTRO. On this
dataset, Onsets and Frames transcription approach proposed by
Hawthorne achieved a stunning onset F1 score of 94.73%. Unlike
the annotation method of Onsets and Frames, Transition-aware
model presented in this paper annotates the attack process of piano
signals called atack transition in multiple frames, instead of only
marking the onset frame. In this way, the piano signals around
onset time are taken into account, enabling the detection of piano onset more stable and robust. Transition-aware achieves a
higher transcription F1 score than Onsets and Frames on MAESTRO dataset and MAPS dataset, reducing many extra note detection errors. This indicates that Transition-aware approach has
better generalization ability on different datasets.
Download Quality Diversity for Synthesizer Sound Matching It is difficult to adjust the parameters of a complex synthesizer to
create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is
a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a
novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is
able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming
to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions
that are diverse in terms of specified audio features. This approach
allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download An Audio-Visual Fusion Piano Transcription Approach Based on Strategy Piano transcription is a fundamental problem in the field of music
information retrieval. At present, a large number of transcriptional
studies are mainly based on audio or video, yet there is a small
number of discussion based on audio-visual fusion. In this paper,
a piano transcription model based on strategy fusion is proposed,
in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used
for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1
score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion.
The experiment results show that the transcription model based on
strategy fusion achieves better results than the one based on feature
fusion.