Download MOSPALOSEP: A Platform for the Binaural Localization and Separation of Spatial Sounds using Models of Interaural Cues and Mixture Models In this paper, we present the MOSPALOSEP platform for the localization and separation of binaural signals. Our methods use short-time spectra of the recorded binaural signals. Based on a parametric model of the binaural mix, we exploit the joint evaluation of interaural cues to derive the location of each time-frequency bin. Then we describe different approaches to establish localization: some based on an energy-weighted histogram in azimuth space, and others based on an unsupervised number of sources identification of Gaussian mixture model combined with the Minimum Description Length. In this way, we use the revealed Gaussian Mixture Model structure to identify the particular region dominated by each source in a multi-source mix. A bank of spatial masks allows the extraction of each source according to the posterior probability or to the Maximum Likelihood binary masks. An important condition is the Windowed-Disjoint Orthogonality of the sources in the time-frequency domain. We assess the source separation algorithms specifically on instruments mix, where this fundamental condition is not satisfied.
Download Exploring Phase Information in Sound Source Separation Applications Separation of instrument sounds from polyphonic music recordings is a desirable signal processing function with a wide variety of applications in music production, video games and information retrieval. In general, sound source separation algorithms attempt to exploit those characteristics of audio signals that differentiate one from the other. Many algorithms have studied spectral magnitude as a means for separation tasks. Here we propose the exploration of phase information of musical instrument signals as an alternative dimension in discriminating sound signals originating from different sources. Three cases are presented: (1) Phase contours of musical instruments notes as potential separation features. (2) Resolving overlapping harmonics using phase coupling properties of musical instruments. (3) Harmonic percussive decomposition using calculated radian ranges for each frequency bin.
Download Comparison of SRP-PHAT and Multiband-PoPi Algorithms for Speaker Localization Using Particle Filters The task of localizing single and multiple concurrent speakers in a reverberant environment with background noise poses several problems. One of the major problems is the severe corruption of the frame-wise localization estimates. To improve the overall localization accuracy, we propose a particle filter based tracking algorithm using the recently proposed Multiband Joint PositionPitch (M-PoPi) localization algorithm as a frame wise likelihood estimate. To prove the performance of our approach, we tested it on real-world recordings of seven different speakers and of up to three concurrent speakers. We compared our new approach to the well-known SRP-PHAT algorithm as frame-wise likelihood estimates. Finally, we compared both particle filter based tracking algorithms with their frame-wise localization algorithms. The MPoPi based particle filter tracking algorithm outperforms the SRPPHAT based particle filter tracking algorithm. The comparison with their frame wise localization algorithms shows that this improved performance stems from the more robust M-PoPi frame wise localization estimate.
Download A Real-Time System for Multiple Acoustic Sources Localization Based on ISP Comparison The growing demand for automatic surveillance systems that integrates different types of sensors, including microphones, requires to adapt and optimize the already studied techniques of Acoustic Source Localization to meet the constraints imposed by the new application scenario. In this paper, we present a real-time prototype for multiple acoustic sources localization in a far-filed and free-field environment. The prototype is composed by two linear arrays and utilizes an innovative approach for the localization of multiple sources. The algorithm is based on two steps: i) the separation of the sources by means of beamforming techniques and ii) the comparison of the power spectrum by means of a spectral distance measure. The prototype was successfully tested in a real environment.
Download Template-Based Estimation of Tempo: Using Unsupervised or Supervised Learning to Create Better Spectral Templates In this paper, we study tempo estimation using spectral templates coming from unsupervised or supervised learning given a database annotated into tempo. More precisely, we study the inclusion of these templates in our tempo estimation algorithm of [1]. For this, we consider as periodicity observation a 48-dimensions vector obtained by sampling the value of the amplitude of the DFT at tempo-related frequencies. We name it spectral template. A set of reference spectral templates is then learned in an unsupervised or supervised way from an annotated database. These reference spectral templates combined with all the possible tempo assumptions constitute the hidden states which we decode using a Viterbi algorithm. Experiments are then performed on the “ballroom dancer” test-set which allows concluding on improvement over state-ofthe-art. In particular, we discuss the use of prior tempo probabilities. It should be noted however that these results are only indicative considering that the training and test-set are the same in this preliminary experiment.
Download A High-Level Audio Feature for Music Retrieval and Sorting We describe an audio analysis method to create a high-level audio annotation, expressed as a single scalar. Typically, low values of this feature indicate songs with dominant harmonic elements while high values indicate the dominance of mainly percussive or drum-like sounds. The proposed feature is based on a simple idea: Filters known from image processing are used to extract attack and harmonic parts of the spectrum, and the ratio of their overall strengths is used as the final feature. The feature takes values in the unit range, and is highly independent of the overall loudness. We present a number of experiments that indicate the potential of the proposed feature. A suggested application scenario is to write the feature value into the comments field of an audio file, so that it can be used by a number of existing audio players in conjunction with metadata-based search mechanisms, most notably genre.
Download Harmonic/Percussive Separation using Median Filtering In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.
Download Singing Voice Separation Based on Non-Vocal Independent Component Subtraction and Amplitude Discrimination Many applications of Music Information Retrieval can benefit from effective isolation of the music sources. Earlier work by the authors led to the development of a system that is based on Azimuth Discrimination and Resynthesis (ADRess) and can extract the singing voice from reverberant stereophonic mixtures. We propose an extension to our previous method that is not based on ADRess and exploits both channels of the stereo mix more effectively. For the evaluation of the system we use a dataset that contains songs convolved during mastering as well as the mixing process (i.e. “real-world” conditions). The metrics for objective evaluation are based on bss_eval.
Download Fusing Block-level Features for Music Similarity Estimation In this paper we present a novel approach to computing music similarity based on block-level features. We first introduce three novel block-level features — the Variance Delta Spectral Pattern (VDSP), the Correlation Pattern (CP) and the Spectral Contrast Pattern (SCP). Then we describe how to combine the extracted features into a single similarity function. A comprehensive evaluation based on genre classification experiments shows that the combined block-level similarity measure (BLS) is comparable, in terms of quality, to the best current method from the literature. But BLS has the important advantage of being based on a vector space representation, which directly facilitates a number of useful operations, such as PCA analysis, k-means clustering, visualization etc. We also show that there is still potential for further improve of music similarity measures by combining BLS with another stateof-the-art algorithm; the combined algorithm then outperforms all other algorithms in our evaluation. Additionally, we discuss the problem of album and artist effects in the context of similaritybased recommendation and show that one can detect the presence of such effects in a given dataset by analyzing the nearest neighbor classification results.
Download Drum Music Transcription Using Prior Subspace Analysis and Pattern Recognition Polyphonic music transcription has been an active field of research for several decades, with significant progress in past few years. In the specific case of automatic drum music transcription, several approaches have been proposed, some of which based on feature analysis, source separation and template matching. In this paper we propose an approach that incorporates some simple rules of music theory with the goal of improving the performance of conventional low-level drum transcription methods. In particular, we use Prior Subspace Analysis for early drum transcription, and we statistically process its output in order to recognize drum patterns and perform error correction. Experiments on polyphonic popular recordings showed that the proposed method improved the transcription accuracy of the original transcription results from 75% to over 90%.