Download Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation
We present a spline-based method for compressing and reconstructing Head-Related Transfer Functions (HRTFs) that preserves perceptual quality. Our approach focuses on the magnitude response and consists of four stages: (1) acquiring minimumphase head-related impulse responses (HRIR), (2) transforming them into the frequency domain and applying adaptive Wiener filtering to preserve important spectral features, (3) extracting a minimal set of control points using derivative-based methods to identify local maxima and inflection points, and (4) reconstructing the HRTF using piecewise cubic Hermite interpolation (PCHIP) over the refined control points. Evaluation on 301 subjects demonstrates that our method achieves an average compression ratio of 4.7:1 with spectral distortion ≤ 1.0 dB in each Equivalent Rectangular Band (ERB). The method preserves binaural cues with a mean absolute interaural level difference (ILD) error of 0.10 dB. Our method achieves about three times the compression obtained with a PCA-based method.
Download Black-box Modeling of Distortion Circuits with Block-Oriented Models
This paper describes black-box modeling of distortion circuits. The analyzed distortion circuits all originate from guitar effect pedals, which are widely used to enrich the sound of an electric guitar with harmonics. The proposed method employs a blockoriented model which consists of a linear block (filter) and a nonlinear block. In this study the nonlinear block is represented by an extended parametric input/output mapping function. Three distortion circuits with different nonlinear elements are analyzed and modeled. The linear and nonlinear parts of the circuit are analyzed and modeled separately. The Levenberg–Marquardt algorithm is used for iterative optimization of the nonlinear parts of the circuits. Some circuits could not be modeled with high accuracy, but the proposed model has shown to be a versatile and flexible tool when modeling distortion circuits.
Download Low bit-rate audio coding with hybrid representations
We present a general audio coder based on a structural decomposition : the signal is expanded into three features : its harmonic part, the transients and the remaining part (referred as the noise). The rst two of these layers can be very eciently encoded in a wellchosen basis. The noise is by construction modelized as a gaussian (colored) random noise. Furthermore, this decomposition allows a good time-frequency psycoacoustic modeling, as it dircetly provides us with the tonal and nontonal part of the signal.
Download Evaluation of a Stochastic Reverberation Model Based on the Source Image Principle
Various audio signal processing applications, such as source separation and dereverberation, require an accurate mathematical modeling of the input audio data. In the literature, many works have focused on source signal modeling, while the reverberation model is often kept very simplistic. This paper aims to investigate a stochastic room impulse response model presented in a previous article: this model is first adapted to discrete time, then we propose a parametric estimation algorithm, that we evaluate experimentally. Our results show that this algorithm is able to efficiently estimate the model parameters, in various experimental settings (various signal-to-noise ratios and absorption coefficients of the room walls).
Download Efficient Modeling and Synthesis of Bell-like Sounds
This paper describes two different techniques that can be used to model and synthesize bell-like sounds. The first one is a sourcefilter model based on frequency-zooming ARMA (pole-zero) modeling techniques. The frequency-zooming approach is powerful also in modal analysis of bell sound behavior. The second technique is based on a digital waveguide with a single loop filter that is designed to generate inharmonic partials by including one or more second-order allpass sections in the loop filter, possibly augmented with one or a few parallel resonators. A small handbell with inharmonic partials was recorded and used as a target of modeling and synthesis. Sound examples are found in http://www.acoustics.hut.fi/demos/dafx02/.
Download Feature design for the classification of audio effect units by input/output measurements
Virtual analog modeling is an important field of digital audio signal processing. It allows to recreate the tonal characteristics of real-world sound sources or to impress the specific sound of a certain analog device upon a digital signal on a software basis. Automatic virtual analog modeling using black-box system identification based on input/output (I/O) measurements is an emerging approach, which can be greatly enhanced by specific pre-processing methods suggesting the best-fitting model to be optimized in the actual identification process. In this work, several features based on specific test signals are presented allowing to categorize instrument effect units into classes of effects, like distortion, compression, modulations and similar categories. The categorization of analog effect units is especially challenging due to the wide variety of these effects. For each device, I/O measurements are performed and a set of features is calculated to allow the classification. The features are computed for several effect units to evaluate their applicability using a basic classifier based on pattern matching.
Download Hierarchical Organization and Visualization of Drum Sample Libraries
Drum samples are an important ingredient for many styles of music. Large libraries of drum sounds are readily available. However, their value is limited by the ways in which users can explore them to retrieve sounds. Available organization schemes rely on cumbersome manual classification. In this paper, we present a new approach for automatically structuring and visualizing large sample libraries through audio signal analysis. In particular, we present a hierarchical user interface for efficient exploration and retrieval based on a computational model of similarity and self-organizing maps.
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.
Download Software modules for HRTF based dynamic spatialisation
This paper describes the object oriented design and development of software modules intended to enhance multimedia presentations with sound sources spatialisation, and environmental effects (reverberation), allowing dynamic reconfiguration of the input sound parameters. Implementations have been carried out on a PC platform, on top of the Win32 API. The resulting modules (in fact C++ classes) have later been integrated into a working application for demonstration purposes.
Download Automatic Polyphonic Piano Note Extraction Using Fuzzy Logic in a Blackboard System
This paper presents a piano transcription system that transforms audio into MIDI format. Human knowledge and psychoacoustic models are implemented in a blackboard architecture, which allows the adding of knowledge with a top-down approach. The analysis is adapted to the information acquired. This technique is referred to as a prediction-driven approach, and it attempts to simulate the adaptation and prediction process taking place in human auditory perception. In this paper we describe the implementation of Polyphonic Note Recognition using a Fuzzy Inference System (FIS) as part of the Knowledge sources in a Blackboard system. The performance of the transcription system shows how polyphonic music transcription is still an unsolved problem, with a success of 45% according to the Dixon formula. However if we consider only the transcribed notes the success increases to 74%. Moreover, the results obtained in the paper presented in [1], show how the transcription can be used with success in a retrieval system, encouraging the authors to develop this technique for more accurate transcription results.