Download The DESAM Toolbox: Spectral Analysis of Musical Audio
In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different “mid-level” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
Download Automatic Detection of Multiple, Cascaded Audio Effects in Guitar Recordings
This paper presents a method to detect and distinguish single and multiple audio effects in monophonic electric guitar recordings. It is based on spectral analysis of audio segments located in the sustain part of guitar tones. Overall, 541 spectral, cepstral and harmonic features are extracted from short time spectra of the audio segments. Support Vector Machines are used in combination with feature selection and transform techniques for automatic classification based on the extracted feature vectors. A novel database that consists of approx. 50000 guitar tones was assembled for the purpose of evaluation. Classification accuracy reached 99.2% for the detection and distinction of arbitrary combinations of six frequently used audio effects.
Download Comparison of Pitch Trackers for Real-Time Guitar Effects
A comparison of various pitch tracking algorithms is presented and the suitability to use these pitch trackers for real-time guitar signal pitch tracking is investigated. The pitch tracking algorithms are described and the performance regarding latency and accuracy is evaluated.
Download Approximating Measured Reverberation Using A Hybrid Fixed/Switched Convolution Structure
An efficient reverberator structure is proposed for approximating measured reverberation. A fixed convolution matching the early portion of a measured impulse response is crossfaded with a switched convolution reverberator drawing its switched convolution section from the late-field of the measured impulse response. In this way, the early portion of the measured impulse response is precisely reproduced, and the late-field equalization and decay rates efficiently approximated. To use segments of the measured impulse response, the switched convolution structure is modified to include a normalization filter to account for the decay of the late-field between the nominal fixed/switched crossfade time and the time of the selected segment. Further, the measured impulse response late-field is extended below its noise floor in anticipation of the normalization. This structure provides psychoacoustically accurate synthesis of the measured impulse response using less than half a second of convolution, irrespective of the length of the measured impulse response. In addition, the structure provides direct control over the equalization and late-field frequency dependent decay rate. Emulations of EMT 140 plate reverberator and marble lobby impulse responses are presented.
Download A Segmental Spectro-Temporal Model of Musical Timbre
We propose a new statistical model of musical timbre that handles the different segments of the temporal envelope (attack, sustain and release) separately in order to account for their different spectral and temporal behaviors. The model is based on a reduced-dimensionality representation of the spectro-temporal envelope. Temporal coefficients corresponding to the attack and release segments are subjected to explicit trajectory modeling based on a non-stationary Gaussian Process. Coefficients corresponding to the sustain phase are modeled as a multivariate Gaussian. A compound similarity measure associated with the segmental model is proposed and successfully tested in instrument classification experiments. Apart from its use in a statistical framework, the modeling method allows intuitive and informative visualizations of the characteristics of musical timbre.
Download Polyphonic Instrument Recognition for Exploring Semantic Similarities in Music
Similarity is a key concept for estimating associations among a set of objects. Music similarity is usually exploited to retrieve relevant items from a dataset containing audio tracks. In this work, we approach the problem of semantic similarity between short pieces of music by analysing their instrumentations. Our aim is to label audio excerpts with the most salient instruments (e.g. piano, human voice, drums) and use this information to estimate a semantic relation (i.e. similarity) between them. We present 3 different methods for integrating along an audio excerpt frame-based classifier decisions to derive its instrumental content. Similarity between audio files is then determined solely by their attached labels. We evaluate our algorithm in terms of label assignment and similarity assessment, observing significant differences when comparing it to commonly used audio similarity metrics. In doing so we test on music from various genres of Western music to simulate real world scenarios.
Download Between Physics and Perception: Signal Models for High Level Audio Processing
The use of signal models is one of the key factors enabling us to establish high quality signal transformation algorithms with intuitive high level control parameters. In the present article we will discuss signal models, and the signal transformation algorithms that are based on these models, in relation to the physical properties of the sound source and the properties of human sound perception. We will argue that the implementation of perceptually intuitive high quality signal transformation algorithms requires strong links between the signal models and the perceptually relevant physical properties of the sound source. We will present an overview over the history of 2 sound models that are used for sound transformation and will show how the past and future evolution of sound transformation algorithms is driven by our understanding of the physical world.
Download A Shape-Invariant Phase Vocoder for Speech Transformation
This paper proposes a new method for shape invariant realtime modification of speech signals. The method can be understood as a frequency domain SOLA algorithm that is using the phase vocoder algorithm for phase synchronization. Compared to time domain SOLA the new implementation provides improved time synchronization during overlap add and improved quality of the noise components of the transformed speech signals. The algorithm has been compared in two perceptual tests with recent implementations of PSOLA and HNM algorithms demonstrating a very satisfying performance. Due to the fact that the quality of transformed signals stays constant over a wide range of transformation parameters the algorithm is well suited for real-time gender and age transformations.
Download An Enhanced Modulation Vocoder for Selective Transposition of Pitch
In previous papers, the concept of the modulation vocoder (MODVOC) has been introduced and its general capability to perform a selective transposition on polyphonic music content has been pointed out. This renders applications possible which aim at changing the key mode of pre-recorded PCM music samples. In this paper, two enhancement techniques for selective pitch transposition by the MODVOC are proposed. The performance of the selective transposition application and the merit of these techniques are benchmarked by results obtained from a specially designed listening test methodology which is capable to govern extreme changes in terms of pitch with respect to the original audio stimuli. Results of this subjective perceptual quality assessment are presented for items that have been converted between minor and major key mode by the MODVOC and, additionally, by the first commercially available software which is also capable of handling this task.
Download Independent Manipulation of High-Level Spectral Envelope Shape Features for Sound Morphing by Means of Evolutionary Computation
The aim of sound morphing is to obtain a sound that falls perceptually between two (or more) sounds. Ideally, we want to morph perceptually relevant features of sounds and be able to independently manipulate them. In this work we present a method to obtain perceptually intermediate spectral envelopes guided by highlevel spectral shape descriptors and a technique that employs evolutionary computation to independently manipulate the timbral features captured by the descriptors. High-level descriptors are measures of the acoustic correlates of salient timbre dimensions derived from perceptual studies, such that the manipulation of the descriptors corresponds to potentially interesting timbral variations.