Download Multi-feature modeling of pulse clarity: Design, validation, and optimisation
Pulse clarity is considered as a high-level musical dimension that conveys how easily in a given musical piece, or a particular moment during that piece, listeners can perceive the underlying rhythmic or metrical pulsation. The objective of this study is to establish a composite model explaining pulse clarity judgments from the analysis of audio recordings, decomposed into a set of independent factors related to various musical dimensions. To evaluate the pulse clarity model, 25 participants have rated the pulse clarity of one hundred excerpts from movie soundtracks. The mapping between the model predictions and the ratings was carried out via regressions. More than three fourth of listeners’ rating variance can be explained with a combination of periodicity-based and nonperiodicity-based factors.
Download Acoustic features for music piece structure analysis
Automatic analysis of the structure of a music piece aims to recover its sectional form: segmentation to musical parts, such as chorus or verse, and detecting repeated occurrences. A music signal is here described with features that are assumed to deliver information about its structure: mel-frequency cepstral coefficients, chroma, and rhythmogram. The features can be focused on different time scales of the signal. Two distance measures are presented for comparing musical sections: “stripes” for detecting repeated feature sequences, and “blocks” for detecting homogenous sections. The features and their time scales are evaluated in a systemindependent manner. Based on the obtained information, the features and distance measures are evaluated in an automatic structure analysis system with a large music database with manually annotated structures. The evaluations show that in a realistic situation, feature combinations perform better than individual features.
Download An experimental comparison of time delay weights for direction of arrival estimation
When direction of arrival is estimated using time differences of arrival, the estimation accuracy is determined by the accuracy of time delay estimates. Probability of large errors increases in poor signal conditions and reverberant conditions pose a significant challenge. To overcome the problems, reliability criteria for time delays and weighted least squares direction estimation have been proposed. This work combines these approaches, and compares several weight criteria for single-frame estimation experimentally. Testing is conducted on different types of audio signals in a loudspeaker experiment. As a result, an optimum combination of weights is found, whose performance exceeds earlier proposals and iterated weighting. Furthermore, the optimum weighting is not dependent on the source signal type, and the best weights are the ones that do not require information about the underlying time delay estimator.
Download Identification of individual guitar sounds by support vector machines
This paper introduces an automatic classification system for the identification of individual classical guitars by single notes played on these guitars. The classification is performed by Support Vector Machines (SVM) that have been trained with the features of the single notes. The features used for classification were the time series of the partial tones, the time series of the MFCCs (Mel Frequency Cepstral Coefficients), and the “nontonal” contributions to the spectrum. The influences of these features on the classification success are reported. With this system, 80% of the sounds recorded with three different guitars were classified correctly. A supplementary classification experiment was carried out with human listeners resulting in a rate of 65% of correct classifications.
Download Automatic alignment of music audio and lyrics
This paper proposes an algorithm for aligning singing in polyphonic music audio with textual lyrics. As preprocessing, the system uses a voice separation algorithm based on melody transcription and sinusoidal modeling. The alignment is based on a hidden Markov model speech recognizer where the acoustic model is adapted to singing voice. The textual input is preprocessed to create a language model consisting of a sequence of phonemes, pauses and possible instrumental breaks. Viterbi algorithm is used to align the audio features with the text. On a test set consisting of 17 commercial recordings, the system achieves an average absolute error of 1.40 seconds in aligning lines of the lyrics.
Download Robustness and independence of voice timbre features under live performance acoustic degradations
Live performance situations can lead to degradations in the vocal signal from a typical microphone, such as ambient noise or echoes due to feedback. We investigate the robustness of continuousvalued timbre features measured on vocal signals (speech, singing, beatboxing) under simulated degradations. We also consider nonparametric dependencies between features, using information theoretic measures and a feature-selection algorithm. We discuss how robustness and independence issues reflect on the choice of acoustic features for use in constructing a continuous-valued vocal timbre space. While some measures (notably spectral crest factors) emerge as good candidates for such a task, others are poor, and some features such as ZCR exhibit an interaction with the type of voice signal being analysed.
Download Analysis of piano tones using an inharmonic inverse comb filter
This paper presents a filter configuration for canceling and separating partials from inharmonic piano tones. The proposed configuration is based on inverse comb filtering, in which the delay line is replaced with a high-order filter that has a proper phase response. Two filter design techniques are tested with the method: an FIR filter, which is designed using frequency sampling, and an IIR filter, which consists of a set of second-order allpass filters that match the desired group delay. It is concluded that it is possible to obtain more accurate results with the FIR filter, while the IIR filter is computationally more efficient. The paper shows that the proposed analysis method provides an effective and easy way of extracting the residual signal and selecting partials from piano tones. This method is suitable for analysis of recorded piano tones.
Download Sound transformation by descriptor using an analytic domain
In many applications of sound transformation, such as sound design, mixing, mastering, and composition the user interactively searches for appropriate parameters. However, automatic applications of sound transformation, such as mosaicing, may require choosing parameters without user intervention. When the target can be specified by its synthesis context, or by example (from features of the example), “adaptive effects” can provide such control. But there exist few general strategies for building adaptive effects from arbitrary sets of transformations and descriptor targets. In this study, we decouple the usually direct link between analysis and transformation in adaptive effects, attempting to include more diverse transformations and descriptors in adaptive transformation, if at the cost of additional complexity or difficulty. We build an analytic model of a deliberately simple transformation-descriptor (TD) domain, and show some preliminary results.
Download Frame level audio similarity - A codebook approach
Modeling audio signals via the long-term statistical distribution of their local spectral features – often denoted as bag of frames (BOF) approach – is a popular and powerful method to describe audio content. While modeling the distribution of local spectral features by semi-parametric distributions (e.g. Gaussian Mixture Models) has been studied intensively, we investigate a non-parametric variant based on vector quantization (VQ) in this paper. The essential advantage of the proposed VQ approach over stateof-the-art audio similarity measures is that the similarity metric proposed here forms a normed vector space. This allows for more powerful search strategies, e.g. KD-Trees or Local Sensitive Hashing (LSH), making content-based audio similarity available for even larger music archives. Standard VQ approaches are known to be computationally very expensive; to counter this problem, we propose a multi-level clustering architecture. Additionally, we show that the multi-level vector quantization approach (ML-VQ), in contrast to standard VQ approaches, is comparable to state-ofthe-art frame-level similarity measures in terms of quality. Another important finding w.r.t. the ML-VQ approach is that, in contrast to GMM models of songs, our approach does not seem to suffer from the recently discovered hub problem.
Download Comb-filter free audio mixing using STFT magnitude spectra and phase estimation
This paper presents a new audio mixing algorithm which avoids comb-filter distortions when mixing an input signal with timedelayed versions of itself. Instead of a simple signal addition in the time domain, the proposed method calculates the short-time Fourier magnitude spectra of the input signals and adds them. The sum determines the output magnitude on the time-frequency plane, whereas a modified RTISI algorithm estimates the missing phase information. An evaluation using PEAQ shows that the proposed method yields much better results than temporal mixing for nonzero delays up to 10 ms.