Download A Single-Azimuth Pinna-Related Transfer Function Database
Pinna-Related Transfer Functions (PRTFs) reflect the modifications undergone by an acoustic signal as it interacts with the listener’s outer ear. These can be seen as the pinna contribution to the Head-Related Transfer Function (HRTF). This paper describes a database of PRTFs collected from measurements performed at the Department of Signal Processing and Acoustics, Aalto University. Median-plane PRTFs at 61 different elevation angles from 25 subjects are included. Such data collection falls into a broader project in which evidence of the correspondence between PRTF features and anthropometry is being investigated.
Download Modal analysis of impact sounds with ESPRIT in Gabor transforms
Identifying the acoustical modes of a resonant object can be achieved by expanding a recorded impact sound in a sum of damped sinusoids. High-resolution methods, e.g. the ESPRIT algorithm, can be used, but the time-length of the signal often requires a sub-band decomposition. This ensures, thanks to sub-sampling, that the signal is analysed over a significant duration so that the damping coefficient of each mode is estimated properly, and that no frequency band is neglected. In this article, we show that the ESPRIT algorithm can be efficiently applied in a Gabor transform (similar to a sub-sampled short-time Fourier transform). The combined use of a time-frequency transform and a high-resolution analysis allows selective and sharp analysis over selected areas of the time-frequency plane. Finally, we show that this method produces high-quality resynthesized impact sounds which are perceptually very close to the original sounds.
Download Automated Calibration of a Parametric Spring Reverb Model
The calibration of a digital spring reverberator model is crucial for the authenticity and quality of the sound produced by the model. In this paper, an automated calibration of the model parameters is proposed, by analysing the spectrogram, the energy decay curve, the spectrum, and the autocorrelation of the time signal and spectrogram. A visual inspection of the spectrograms as well as a comparison of sound samples proves the approach to be successful for estimating the parameters of reverberators with one, two and three springs. This indicates that the proposed method is a viable alternative to manual calibration of spring reverberator models.
Download Similarity-based Sound Source Localization with a Coincident Microphone Array
This paper presents a robust, accurate sound source localization method using a compact, near-coincident microphone array. We derive features by combining the microphone signals and determine the direction of a single sound source by similarity matching. Therefore, the observed features are compared with a set of previously measured reference features, which are stored in a look-up table. By proper processing in the similarity domain, we are able to deal with signal pauses and low SNR without the need of a separate detection algorithm. For practical evaluation, we made recordings of speech signals (both loudspeaker-playback and human speaker) with a planar 4-channel prototype array in a medium-sized room. The proposed approach clearly outperforms existing coincident localization methods. We achieve high accuracy (2◦ mean absolute azimuth error at 0 dB SNR) for static sources, while being able to quickly follow rapid source angle changes.
Download Synchronization of intonation adjustments in violin duets: towards an objective evaluation of musical interaction
In ensemble music performance, such as a string quartet or duet, the musicians interact and influence each other’s performance via a multitude of parameters – including tempo, dynamics, articulation of musical phrases and, depending on the type of instrument, intonation. This paper presents our ongoing research on the effect of interaction between violinists, in terms of intonation. We base our analysis on a series of experiments with professional as well as amateur musicians playing in duet and solo experimental set-ups, and then apply a series of interdependence measures on each violinist’s pitch deviations from the score. Our results show that while it is possible to, solely based on intonation, distinguish between solo and duet performances for simple cases, there is a multitude of underlying factors that need to be analyzed before these techniques can be applied to more complex pieces and/or non-experimental situations.
Download Sound Analysis and Synthesis Adaptive in Time and Two Frequency Bands
We present an algorithm for sound analysis and resynthesis with local automatic adaptation of time-frequency resolution. There exists several algorithms allowing to adapt the analysis window depending on its time or frequency location; in what follows we propose a method which select the optimal resolution depending on both time and frequency. We consider an approach that we denote as analysis-weighting, from the point of view of Gabor frame theory. We analyze in particular the case of different adaptive timevarying resolutions within two complementary frequency bands; this is a typical case where perfect signal reconstruction cannot in general be achieved with fast algorithms, causing a certain error to be minimized. We provide examples of adaptive analyses of a music sound, and outline several possibilities that this work opens.
Download Audio De-Thumping using Huang s Empirical Mode Decomposition
In the context of audio restoration, sound transfer of broken disks usually produces audio signals corrupted with long pulses of low-frequency content, also called thumps. This paper presents a method for audio de-thumping based on Huang’s Empirical Mode Decomposition (EMD), provided the pulse locations are known beforehand. Thus, the EMD is used as a means to obtain pulse estimates to be subtracted from the degraded signals. Despite its simplicity, the method is demonstrated to tackle well the challenging problem of superimposed pulses. Performance assessment against selected competing solutions reveals that the proposed solution tends to produce superior de-thumping results.
Download Combining classifications based on local and global features: application to singer identification
In this paper we investigate the problem of singer identification on acapella recordings of isolated notes. Most of studies on singer identification describe the content of signals of singing voice with features related to the timbre (such as MFCC or LPC). These features aim to describe the behavior of frequencies at a given instant of time (local features). In this paper, we propose to describe sung tone with the temporal variations of the fundamental frequency (and its harmonics) of the note. The periodic and continuous variations of the frequency trajectories are analyzed on the whole note and the features obtained reflect expressive and intonative elements of singing such as vibrato, tremolo and portamento. The experiments, conducted on two distinct data-sets (lyric and pop-rock singers), prove that the new set of features capture a part of the singer identity. However, these features are less accurate than timbre-based features. We propose to increase the recognition rate of singer identification by combining information conveyed by local and global description of notes. The proposed method, that shows good results, can be adapted for classification problem involving a large number of classes, or to combine classifications with different levels of performance.
Download Sinusoid Extraction and Salience Function Design for Predominant Melody Estimation
In this paper we evaluate some of the alternative methods commonly applied in the first stages of the signal processing chain of automatic melody extraction systems. Namely, the first two stages are studied – the extraction of sinusoidal components and the computation of a time-pitch salience function, with the goal of determining the benefits and caveats of each approach under the specific context of predominant melody estimation. The approaches are evaluated on a data-set of polyphonic music containing several musical genres with different singing/playing styles, using metrics specifically designed for measuring the usefulness of each step for melody extraction. The results suggest that equal loudness filtering and frequency/amplitude correction methods provide significant improvements, whilst using a multi-resolution spectral transform results in only a marginal improvement compared to the standard STFT. The effect of key parameters in the computation of the salience function is also studied and discussed.