Download Archaeological Acoustic Space Measurement for Convolution Reverberation and Auralization Applications Developments in measuring the acoustic characteristics of concert halls and opera houses are leading to standardized methods of impulse response capture for a wide variety of auralization applications. This work presents results from a recent UK survey of non-traditional performance venues focused in the field of acoustic archaeology. Sites are selected and analyzed based on some feature of interest in terms of their acoustic properties. As well as providing some insight as to the characteristics and construction of these spaces, the resulting database of measurements has a primary use in convolution based reverberation and auralization. A recent sound installation based on one of the selected sites is also presented.
Download Improved Cocktail-Party Processing The human auditory system is able to focus on one speech signal and ignore other speech signals in an auditory scene where several conversations are taking place. This ability of the human auditory system is referred to as the “cocktail-party effect”. This property of human hearing is partly made possible by binaural listening. Interaural time differences (ITDs) and interaural level differences (ILDs) between the ear input signals are the two most important binaural cues for localization of sound sources, i.e. the estimation of source azimuth angles. This paper proposes an implementation of a cocktail-party processor. The proposed cocktail-party processor carries out an auditory scene analysis by estimating the binaural cues corresponding to the directions of the sources. And next, as a function of these cues, suppresses components of signals arriving from non-desired directions, by speech enhancement techniques. The performance of the proposed algorithm is assessed in terms of directionality and speech quality. The proposed algorithm improves existing cocktail-party processors since it combines low computational complexity and efficient source separation. Moreover the advantage of this cocktailparty processor over conventional beam forming is that it enables a highly directional beam over a wide frequency range by using only two microphones.
Download A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues In this paper we propose a complete computational system for Auditory Scene Analysis. This time-frequency system localizes, separates, and spatializes an arbitrary number of audio sources given only binaural signals. The localization is based on recent research frameworks, where interaural level and time differences are combined to derive a confident direction of arrival (azimuth) at each frequency bin. Here, the power-weighted histogram constructed in the azimuth space is modeled as a Gaussian Mixture Model, whose parameter structure is revealed through a weighted Expectation Maximization. Afterwards, a bank of Gaussian spatial filters is configured automatically to extract the sources with significant energy accordingly to a posterior probability. In this frequency-domain framework, we also inverse a geometrical and physical head model to derive an algorithm that simulates a source as originating from any azimuth angle.
Download Assessing the Quality of the Extraction and Tracking of Sinusoidal Components: Towards an Evaluation Methodology In this paper, we introduce two original evaluation methods in the context of sinusoidal modeling. The first one assesses the quality of the extraction of sinusoidal components from short-time signals, whereas the second one focuses on the quality of the tracking of these sinusoidal components over time. Each proposed method intends to use a unique cost function that globally reflects the performance of the tested algorithm in a realistic framework. Clearly defined evaluation protocols are then proposed with several test cases to evaluate most of the desired properties of extractors or trackers of sinusoidal components. This paper is a first proposal to be used as a starting point in a sinusoidal analysis / synthesis contest to be held at DAFx’07.
Download Sinusoidal Extraction Using an Efficient Implementation of a Multi-Resolution FFT This paper provides a detailed description of the spectral analysis front-end of a melody extraction algorithm. Our particular approach aims at extracting the sinusoidal components from the audio signal. It includes a novel technique for the efficient computation of STFT spectra in different time-frequency resolutions. Furthermore, we exploit the application of local sinusoidality criteria, in order to detect stable sinusoids in individual FFT frames. The evaluation results show that a multi resolution analysis improves the sinusoidal extraction in polyphonic audio.
Download High Accuracy Frame-by-Frame Non-Stationary Sinusoidal Modelling This paper describes techniques for obtaining high accuracy estimates, including those of non-stationarity, of parameters for sinusoidal modelling using a single frame of analysis data. In this case the data used is generated from the time and frequency reassigned short-time Fourier transform (STFT). Such a system offers the potential for quasi real-time (frame-by-frame) spectral modelling of audio signals.
Download Granular Resynthesis for Sound Unmixing In modern music genres like Pop, Rap, Hip-Hop or Techno many songs are built in a way that a pool of small musical pieces, so called loops, are used as building blocks. These loops are usually one, two or four bars long and build the accompaniment for the lead melody or singing voice. Very often the accompanying loops can be heard solo in a song at least once. This can be used as a-priori knowledge for removing these loops from the mixture. In this paper an algorithm based on granular resynthesis and spectral subtraction is presented which makes use of this a-priori knowledge. The algorithm uses two different synthesis strategies and is capable of removing known loops from mixtures even if the loop signal contained in the mixture signal is slightly different from the solo loop signal.
Download Extraction and Removal of Percussive Sounds from Musical Recordings Automated removal and extraction (isolation) of percussive sounds embedded in an audio signal is useful for a variety of applications such as speech enhancement and for music processing effects. A novel method is presented to accomplish both extraction and removal of beats, using an adaptive filter based on the LMS algorithm. Empirical evaluation is undertaken using computer generated music with a mix of natural voice and repeating drum, and shows that the efficacy of the system is robust to different sound processing techniques such as non-linear distortion and tempo jitter.
Download Representations of Audio Signals in Overcomplete Dictionaries: What is the Link Between Redundancy Factor and Coding Properties? This paper addresses the link between the size of the dictionary in overcomplete decompositions of signals and the rate-distortion properties when such decompositions are used for audio coding. We have performed several experiments with sets of nested dictionaries showing that very redundant shift-invariant and multi-scale dictionaries have a clear benefit at low bit-rates ; however for very low distortion a lot of atoms have to be encoded, in these cases orthogonal transforms such as the MDCT give better results.
Download A Spatial Interface for Audio and Music Production In an effort to find a better suited interface for musical performance, a novel approach has been discovered and developed. At the heart of this approach is the concept of physical interaction with sound in space, where sound processing occurs at various 3D locations and sending sound signals from one area to another is based on physical models of sound propagation. The control is based on a gestural vocabulary that is familiar to users, involving natural spatial interaction such as translating, rotating, and pointing in 3-D. This research presents a framework to deal with realtime control of 3-D audio, and describes how to construct audio scenes to accomplish various musical tasks. The generality and effectiveness of this approach has enabled us to re-implement several conventional applications, with the benefit of a substantially more powerful interface, and has further led to the conceptualization of several novel applications.