Download A comparison Between Fixed and Multiresolution Analysis for Onset Detection in Musical Signals A study is presented for the use of multiresolution analysis-based onset detection in the complex domain. It shows that using variable time-resolution across frequency bands generates sharper detection functions for higher bands and more accurate detection functions for lower bands. The resulting method improves the localisation of onsets on fixed-resolution schemes, by favouring the increased time precision of higher subbands during the combination of results.
Download Piano Transcription Using Pattern Recognition: Aspects on Parameter Extraction A method for chord recognition for piano transcription has been previously presented by the authors. The method presents some limitations due to errors in parameter extraction carried out during the training process. Parameter extraction of piano notes is not as straightforward as sometimes can be thought. Spectral components detection is necessary but not enough to obtain accurately some note parameters. The inharmonicity coefficient B is one of the parameters that are difficult to evaluate. The obtained value of B is different for every partial used to calculate it, and sometimes, these differences are high. Tuning with respect to tempered scale is another important note parameter. The problems arise when we try to measure the tuning of a note belonging to octaves 0 or 1, because the fundamental is radiated by the soundboard with a very low level and, therefore, it is not captured by the recording microphone and cannot be measured. A method to avoid these drawbacks is presented in this paper, including an explanation of the basis.
Download On Finding Melodic Lines in Audio Recordings The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (melodic fragments), while in the second stage, these fragments are grouped according to their properties (pitch, loudness...) into clusters which represent melodic lines of the piece. Expectation Maximization algorithm is used in both stages to find the dominant pitch in a region, and to train Gaussian Mixture Models that group fragments into melodies. The paper presents the entire process in more detail and provides some initial results.
Download Musical Instrument Identification in Continuous Recordings Recognition of musical instruments in multi-instrumental, polyphonic music is a difficult challenge which is yet far from being solved. Successful instrument recognition techniques in solos (monophonic or polyphonic recordings of single instruments) can help to deal with this task. We introduce an instrument recognition process in solo recordings of a set of instruments (bassoon, clarinet, flute, guitar, piano, cello and violin), which yields a high recognition rate. A large and very diverse solo database (108 different solos, all by different performers) is used in order to encompass the different sound possibilities of each instrument and evaluate the generalization abilities of the classification process. First we bring classification results using a very extensive collection of features (62 different feature types), and then use our GDE feature selection algorithm to select a smaller feature set with a relatively short computation time, which allows us to perform instrument recognition in solos in real-time, with only a slight decrease in recognition rate. We demonstrate that our real-time solo classifier can also be useful for instrument recognition in duet performances, and improved using simple “source reduction”.
Download Bayesian Identification of Closely-Spaced Chords from Single-Frame STFT Peaks Identifying chords and related musical attributes from digital audio has proven a long-standing problem spanning many decades of research. A robust identification may facilitate automatic transcription, semantic indexing, polyphonic source separation and other emerging applications. To this end, we develop a Bayesian inference engine operating on single-frame STFT peaks. Peak likelihoods conditional on pitch component information are evaluated by an MCMC approach accounting for overlapping harmonics as well as undetected/spurious peaks, thus facilitating operation in noisy environments at very low computational cost. Our inference engine evaluates posterior probabilities of musical attributes such as root, chroma (including inversion), octave and tuning, given STFT peak frequency and amplitude observations. The resultant posteriors become highly concentrated around the correct attributes, as demonstrated using 227 ms piano recordings with −10 dB additive white Gaussian noise.
Download A New Score Function for Joint Evaluation of Multiple F0 Hypotheses This article is concerned with the estimation of the fundamental frequencies of the quasiharmonic sources in polyphonic signals for the case that the number of sources is known. We propose a new method for jointly evaluating multiple F0 hypotheses based on three physical principles: harmonicity, spectral smoothness and synchronous amplitude evolution within a single source. Given the observed spectrum a set of F0 candidates is listed and for any hypothetical combination among the candidates the corresponding hypothetical partial sequences are derived. Hypothetical partial sequences are then evaluated using a score function formulating the guiding principles in mathematical forms. The algorithm has been tested on a large collection of arti cially mixed polyphonic samples and the encouraging results demonstrate the competitive performance of the proposed method.
Download Sound Source Separation: Azimuth Discrimination and Resynthesis In this paper we present a novel sound source separation algorithm which requires no prior knowledge, no learning, assisted or otherwise, and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. We present results obtained from real recordings, and show that for musical recordings, the algorithm improves upon the output quality of current source separation schemes.
Download Source Separation for WFS Acoustic Opening Applications This paper proposes a new scheme to reduce coding bit rate in array based multichannel audio applications like the acoustic opening, which can be used for modern teleconference systems. The combination of beamforming techniques for source separation and wave field synthesis allows a significant coding bit rate reduction. To evaluate the quality of this new scheme, both objective and subjective tests have been carried out. The objective measurement system is based on the Perceptual Audio Quality Measure of the binaural signal that the listener would perceive in a real environment.
Download Analysis of Certain Challenges for the Use of Wave Field Synthesis in Concert-Based Applications Wave Field Synthesis (WFS) provides a means for reproducing 3D sound fields over an extended area. Beyond conventional audio reproduction applications, present research at IRCAM involves augmenting the realism of concert-based applications in which real musicians will be interacting on stage with virtual sources reproduced by WFS. The stake of such a situation is to create virtual sound sources which behave as closely as possible to real sound sources, in order to obtain a natural balance between real and virtual sources. The goal of this article is to point out physical differences between real sound sources and WFS reproduced sources situated at the same position, considering successively the sound field associated to the direct sound of the virtual source and its interaction with the room. Methods for taking into account and compensating these differences are proposed.
Download A Maximum Likelihood Approach to Blind Audio De-Reverberation Blind audio de-reverberation is the problem of removing reverb from an audio signal without having explicit data regarding the system and/or the input signal. Blind audio de-reverberation is a more difficult signal-processing task than ordinary dereverberation based on deconvolution. In this paper different blind de-reverberation algorithms derived from kurtosis maximization and a maximum likelihood approach are analyzed and implemented.