Download Automatic Segmentation of the Temporal Evolution of Isolated Acoustic Musical Instruments Sounds Using Spectro-Temporal Cues The automatic segmentation of isolated musical instrument sounds according to the temporal evolution is not a trivial task. It requires a model capable of capturing regions such as the attack, decay, sustain and release accurately for many types of instruments with different modes of excitation. The traditional ADSR amplitude envelope model does not apply universally to acoustic musical instrument sounds with different excitation methods because it uses strictly amplitude information and supposes all sounds manifest the same temporal evolution. We present an automatic segmentation technique based on a more realistic model of the temporal evolution of many types of acoustic musical instruments that incorporates both temporal and spectrotemporal cues. The method allows a robust and more perceptually relevant automatic segmentation of the isolated sounds of many musical instruments that fit the model.
Download Time-Dependent Parametric and Harmonic Templates in Non-Negative Matrix Factorization This paper presents a new method to decompose musical spectrograms derived from Non-negative Matrix Factorization (NMF). This method uses time-varying harmonic templates (atoms) which are parametric: these atoms correspond to musical notes. Templates are synthesized from the values of the parameters which are learnt in an NMF framework. This parameterization permits to accurately model some musical effects (such as vibrato) which are inaccurately modeled by NMF.
Download The DESAM Toolbox: Spectral Analysis of Musical Audio In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different “mid-level” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
Download Automatic Detection of Multiple, Cascaded Audio Effects in Guitar Recordings This paper presents a method to detect and distinguish single and multiple audio effects in monophonic electric guitar recordings. It is based on spectral analysis of audio segments located in the sustain part of guitar tones. Overall, 541 spectral, cepstral and harmonic features are extracted from short time spectra of the audio segments. Support Vector Machines are used in combination with feature selection and transform techniques for automatic classification based on the extracted feature vectors. A novel database that consists of approx. 50000 guitar tones was assembled for the purpose of evaluation. Classification accuracy reached 99.2% for the detection and distinction of arbitrary combinations of six frequently used audio effects.
Download Comparison of Pitch Trackers for Real-Time Guitar Effects A comparison of various pitch tracking algorithms is presented and the suitability to use these pitch trackers for real-time guitar signal pitch tracking is investigated. The pitch tracking algorithms are described and the performance regarding latency and accuracy is evaluated.
Download Approximating Measured Reverberation Using A Hybrid Fixed/Switched Convolution Structure An efficient reverberator structure is proposed for approximating measured reverberation. A fixed convolution matching the early portion of a measured impulse response is crossfaded with a switched convolution reverberator drawing its switched convolution section from the late-field of the measured impulse response. In this way, the early portion of the measured impulse response is precisely reproduced, and the late-field equalization and decay rates efficiently approximated. To use segments of the measured impulse response, the switched convolution structure is modified to include a normalization filter to account for the decay of the late-field between the nominal fixed/switched crossfade time and the time of the selected segment. Further, the measured impulse response late-field is extended below its noise floor in anticipation of the normalization. This structure provides psychoacoustically accurate synthesis of the measured impulse response using less than half a second of convolution, irrespective of the length of the measured impulse response. In addition, the structure provides direct control over the equalization and late-field frequency dependent decay rate. Emulations of EMT 140 plate reverberator and marble lobby impulse responses are presented.
Download A Segmental Spectro-Temporal Model of Musical Timbre We propose a new statistical model of musical timbre that handles the different segments of the temporal envelope (attack, sustain and release) separately in order to account for their different spectral and temporal behaviors. The model is based on a reduced-dimensionality representation of the spectro-temporal envelope. Temporal coefficients corresponding to the attack and release segments are subjected to explicit trajectory modeling based on a non-stationary Gaussian Process. Coefficients corresponding to the sustain phase are modeled as a multivariate Gaussian. A compound similarity measure associated with the segmental model is proposed and successfully tested in instrument classification experiments. Apart from its use in a statistical framework, the modeling method allows intuitive and informative visualizations of the characteristics of musical timbre.
Download Polyphonic Instrument Recognition for Exploring Semantic Similarities in Music Similarity is a key concept for estimating associations among a set of objects. Music similarity is usually exploited to retrieve relevant items from a dataset containing audio tracks. In this work, we approach the problem of semantic similarity between short pieces of music by analysing their instrumentations. Our aim is to label audio excerpts with the most salient instruments (e.g. piano, human voice, drums) and use this information to estimate a semantic relation (i.e. similarity) between them. We present 3 different methods for integrating along an audio excerpt frame-based classifier decisions to derive its instrumental content. Similarity between audio files is then determined solely by their attached labels. We evaluate our algorithm in terms of label assignment and similarity assessment, observing significant differences when comparing it to commonly used audio similarity metrics. In doing so we test on music from various genres of Western music to simulate real world scenarios.
Download Between Physics and Perception: Signal Models for High Level Audio Processing The use of signal models is one of the key factors enabling us to establish high quality signal transformation algorithms with intuitive high level control parameters. In the present article we will discuss signal models, and the signal transformation algorithms that are based on these models, in relation to the physical properties of the sound source and the properties of human sound perception. We will argue that the implementation of perceptually intuitive high quality signal transformation algorithms requires strong links between the signal models and the perceptually relevant physical properties of the sound source. We will present an overview over the history of 2 sound models that are used for sound transformation and will show how the past and future evolution of sound transformation algorithms is driven by our understanding of the physical world.
Download A Shape-Invariant Phase Vocoder for Speech Transformation This paper proposes a new method for shape invariant realtime modification of speech signals. The method can be understood as a frequency domain SOLA algorithm that is using the phase vocoder algorithm for phase synchronization. Compared to time domain SOLA the new implementation provides improved time synchronization during overlap add and improved quality of the noise components of the transformed speech signals. The algorithm has been compared in two perceptual tests with recent implementations of PSOLA and HNM algorithms demonstrating a very satisfying performance. Due to the fact that the quality of transformed signals stays constant over a wide range of transformation parameters the algorithm is well suited for real-time gender and age transformations.