Download Tiv.lib: An Open-Source Library for the Tonal Description of Musical Audio
In this paper, we present TIV.lib, an open-source library for the
content-based tonal description of musical audio signals. Its main
novelty relies on the perceptually-inspired Tonal Interval Vector
space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed—e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented
in Python and the graphical programming language Pure Data, and
can be used in both online and offline scenarios. Of note is its
potential for enhanced Music Information Retrieval, where tonal
descriptors sit at the core of numerous methods and applications.
Download Recognizing Guitar Effects and Their Parameter Settings
Guitar effects are commonly used in popular music to shape the
guitar sound to fit specific genres or to create more variety within
musical compositions. The sound is not only determined by the
choice of the guitar effect, but also heavily depends on the parameter settings of the effect. This paper introduces a method to
estimate the parameter settings of guitar effects, which makes it
possible to reconstruct the effect and its settings from an audio
recording of a guitar. The method utilizes audio feature extraction and shallow neural networks, which are trained on data created specifically for this task. The results show that the method
is generally suited for this task with average estimation errors of
±5% − ±16% of different parameter scales and could potentially
perform near the level of a human expert.
Download Diet Deep Generative Audio Models With Structured Lottery
Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy
of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the
quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect
is especially critical in audio applications, which heavily relies on
specialized embedded hardware with real-time constraints.
In this paper, we build on recent observations that deep models are highly overparameterized, by studying the lottery ticket hypothesis on deep generative audio models. This hypothesis states
that extremely efficient small sub-networks exist in deep models
and would provide higher accuracy than larger models if trained in
isolation. However, lottery tickets are found by relying on unstructured masking, which means that resulting models do not provide
any gain in either disk size or inference time. Instead, we develop
here a method aimed at performing structured trimming. We show
that this requires to rely on global selection and introduce a specific criterion based on mutual information.
First, we confirm the surprising result that smaller models provide higher accuracy than their large counterparts. We further
show that we can remove up to 95% of the model weights without significant degradation in accuracy. Hence, we can obtain very
light models for generative audio across popular methods such as
Wavenet, SING or DDSP, that are up to 100 times smaller with
commensurate accuracy. We study the theoretical bounds for embedding these models on Raspberry Pi and Arduino, and show that
we can obtain generative models on CPU with equivalent quality
as large GPU models. Finally, we discuss the possibility of implementing deep generative audio models on embedded platforms.