Download Comparison of SRP-PHAT and Multiband-PoPi Algorithms for Speaker Localization Using Particle Filters The task of localizing single and multiple concurrent speakers in a reverberant environment with background noise poses several problems. One of the major problems is the severe corruption of the frame-wise localization estimates. To improve the overall localization accuracy, we propose a particle filter based tracking algorithm using the recently proposed Multiband Joint PositionPitch (M-PoPi) localization algorithm as a frame wise likelihood estimate. To prove the performance of our approach, we tested it on real-world recordings of seven different speakers and of up to three concurrent speakers. We compared our new approach to the well-known SRP-PHAT algorithm as frame-wise likelihood estimates. Finally, we compared both particle filter based tracking algorithms with their frame-wise localization algorithms. The MPoPi based particle filter tracking algorithm outperforms the SRPPHAT based particle filter tracking algorithm. The comparison with their frame wise localization algorithms shows that this improved performance stems from the more robust M-PoPi frame wise localization estimate.
Download A Real-Time System for Multiple Acoustic Sources Localization Based on ISP Comparison The growing demand for automatic surveillance systems that integrates different types of sensors, including microphones, requires to adapt and optimize the already studied techniques of Acoustic Source Localization to meet the constraints imposed by the new application scenario. In this paper, we present a real-time prototype for multiple acoustic sources localization in a far-filed and free-field environment. The prototype is composed by two linear arrays and utilizes an innovative approach for the localization of multiple sources. The algorithm is based on two steps: i) the separation of the sources by means of beamforming techniques and ii) the comparison of the power spectrum by means of a spectral distance measure. The prototype was successfully tested in a real environment.
Download Template-Based Estimation of Tempo: Using Unsupervised or Supervised Learning to Create Better Spectral Templates In this paper, we study tempo estimation using spectral templates coming from unsupervised or supervised learning given a database annotated into tempo. More precisely, we study the inclusion of these templates in our tempo estimation algorithm of [1]. For this, we consider as periodicity observation a 48-dimensions vector obtained by sampling the value of the amplitude of the DFT at tempo-related frequencies. We name it spectral template. A set of reference spectral templates is then learned in an unsupervised or supervised way from an annotated database. These reference spectral templates combined with all the possible tempo assumptions constitute the hidden states which we decode using a Viterbi algorithm. Experiments are then performed on the “ballroom dancer” test-set which allows concluding on improvement over state-ofthe-art. In particular, we discuss the use of prior tempo probabilities. It should be noted however that these results are only indicative considering that the training and test-set are the same in this preliminary experiment.
Download A High-Level Audio Feature for Music Retrieval and Sorting We describe an audio analysis method to create a high-level audio annotation, expressed as a single scalar. Typically, low values of this feature indicate songs with dominant harmonic elements while high values indicate the dominance of mainly percussive or drum-like sounds. The proposed feature is based on a simple idea: Filters known from image processing are used to extract attack and harmonic parts of the spectrum, and the ratio of their overall strengths is used as the final feature. The feature takes values in the unit range, and is highly independent of the overall loudness. We present a number of experiments that indicate the potential of the proposed feature. A suggested application scenario is to write the feature value into the comments field of an audio file, so that it can be used by a number of existing audio players in conjunction with metadata-based search mechanisms, most notably genre.
Download Harmonic/Percussive Separation using Median Filtering In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.
Download Singing Voice Separation Based on Non-Vocal Independent Component Subtraction and Amplitude Discrimination Many applications of Music Information Retrieval can benefit from effective isolation of the music sources. Earlier work by the authors led to the development of a system that is based on Azimuth Discrimination and Resynthesis (ADRess) and can extract the singing voice from reverberant stereophonic mixtures. We propose an extension to our previous method that is not based on ADRess and exploits both channels of the stereo mix more effectively. For the evaluation of the system we use a dataset that contains songs convolved during mastering as well as the mixing process (i.e. “real-world” conditions). The metrics for objective evaluation are based on bss_eval.
Download Fusing Block-level Features for Music Similarity Estimation In this paper we present a novel approach to computing music similarity based on block-level features. We first introduce three novel block-level features — the Variance Delta Spectral Pattern (VDSP), the Correlation Pattern (CP) and the Spectral Contrast Pattern (SCP). Then we describe how to combine the extracted features into a single similarity function. A comprehensive evaluation based on genre classification experiments shows that the combined block-level similarity measure (BLS) is comparable, in terms of quality, to the best current method from the literature. But BLS has the important advantage of being based on a vector space representation, which directly facilitates a number of useful operations, such as PCA analysis, k-means clustering, visualization etc. We also show that there is still potential for further improve of music similarity measures by combining BLS with another stateof-the-art algorithm; the combined algorithm then outperforms all other algorithms in our evaluation. Additionally, we discuss the problem of album and artist effects in the context of similaritybased recommendation and show that one can detect the presence of such effects in a given dataset by analyzing the nearest neighbor classification results.
Download Drum Music Transcription Using Prior Subspace Analysis and Pattern Recognition Polyphonic music transcription has been an active field of research for several decades, with significant progress in past few years. In the specific case of automatic drum music transcription, several approaches have been proposed, some of which based on feature analysis, source separation and template matching. In this paper we propose an approach that incorporates some simple rules of music theory with the goal of improving the performance of conventional low-level drum transcription methods. In particular, we use Prior Subspace Analysis for early drum transcription, and we statistically process its output in order to recognize drum patterns and perform error correction. Experiments on polyphonic popular recordings showed that the proposed method improved the transcription accuracy of the original transcription results from 75% to over 90%.
Download Automatic Segmentation of the Temporal Evolution of Isolated Acoustic Musical Instruments Sounds Using Spectro-Temporal Cues The automatic segmentation of isolated musical instrument sounds according to the temporal evolution is not a trivial task. It requires a model capable of capturing regions such as the attack, decay, sustain and release accurately for many types of instruments with different modes of excitation. The traditional ADSR amplitude envelope model does not apply universally to acoustic musical instrument sounds with different excitation methods because it uses strictly amplitude information and supposes all sounds manifest the same temporal evolution. We present an automatic segmentation technique based on a more realistic model of the temporal evolution of many types of acoustic musical instruments that incorporates both temporal and spectrotemporal cues. The method allows a robust and more perceptually relevant automatic segmentation of the isolated sounds of many musical instruments that fit the model.
Download Time-Dependent Parametric and Harmonic Templates in Non-Negative Matrix Factorization This paper presents a new method to decompose musical spectrograms derived from Non-negative Matrix Factorization (NMF). This method uses time-varying harmonic templates (atoms) which are parametric: these atoms correspond to musical notes. Templates are synthesized from the values of the parameters which are learnt in an NMF framework. This parameterization permits to accurately model some musical effects (such as vibrato) which are inaccurately modeled by NMF.