Download Estimation and Modeling of Pinna-Related Transfer Functions This paper considers the problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural modus operandi, we present an algorithm for the decomposition of PRTFs into ear resonances and frequency notches due to reflections over pinna cavities. Such an approach allows to control the evolution of each physical phenomenon separately through the design of two distinct filter blocks during PRTF synthesis. The resulting model is suitable for future integration into a structural head-related transfer function model, and for parametrization over anthropometrical measurements of a wide range of subjects.
Download Spatial Audio Object Coding with Enhanced Audio Object Separation Spatial sound reproduction on a multi-channel loudspeaker setup indicate a consistent trend in today’s audio playback systems. Digital surround sound significantly improves the realism of the spatial sound experience, but also results in a drastic increase in required audio data rate. Spatial Audio Coding (SAC) technology provides means for efficient storage and transmission of multi-channel signals by a downmix signal and associated parametric side information describing the spatial sound image. More recently, SAC has been extended with an object-based concept termed Spatial Audio Object Coding (SAOC) enabling efficient coding and interactive spatial rendering of multiple individual audio objects at the playback side. Due to the underlying parametric coding approach, object level manipulations may affect the produced perceptual sound scene quality, and using extreme object attenuation or boosting may result in unacceptably degraded audio quality. The paper describes how regular SAOC processing is advanced to ensure high quality sound reproduction even in demanding remix applications.
Download A Database of Partial Tracks for Evaluation of Sinusoidal Models This paper presents a database of partial tracks extracted from synthetic as well as pre-recorded musical signals, designed to serve as an ancillary tool for evaluation of sinusoidal analysis algorithms. In order to accomplish this goal, the database requirements have been carefully specified. A semi-automatic analysis methodology to ensure the track parameters are precisely estimated has been employed. The overall methodology is validated via the application of performance tests over the synthetic source-signals.
Download Discrete Wavelet Transform based Shift-Invariant Analysis Scheme for Transient Sound Signals Discrete wavelet transform (DWT) has gained widespread recognition and popularity in signal processing due to its ability to underline and represent time-varying spectral properties of many transient and other nonstationary signals. However, DWT is a shiftvariant transform. This shift-variance is a major problem with the use of DWT for transient signal analysis and pattern recognition applications. A number of modified forms of DWT have been investigated in recent years that provide approximate shift-invariant transform but at the cost of increased redundancy and complexity. In this paper, a shift-invariant analysis scheme is proposed which is nonredundant. This scheme combines minimum-phase (MP) reconstruction with the DWT so that the resultant scheme provides a shift-invariant transform. The detailed properties of MP signal and different methods to reconstruct it are explained. The proposed scheme can be used for the analysis-synthesis, classification, and compression of transient sound signals.
Download The Restoration of Single Channel Audio Recordings Based on Non-Negative Matrix Factorization and Perceptual Suppression Rule In this paper, we focus on the signal-to-noise ratio (SNR) improvement in single channel audio recordings. Many approaches have been reported in the literature. The most popular method, with many variants, is Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a perceptually annoying residual noise usually called musical noise. In this paper we investigate the use of Non-negative Matrix Factorization (NMF) as an alternative to the STSA for the digital curation of musical heritage. NMF is an emerging new technique in the blind extraction of signals recorded in a variety of different fields. The application of NMF to the analysis of monaural recordings is relatively recent. We show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. More specifically, we introduce a perceptual suppression rule to determine how the perceptual domain is competitive compared to the acoustic domain. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their wider applicability in audio separation.
Download Towards a Fuzzy Logic Approach to Drum Pattern Humanisation A fuzzy logic-based approach can be used to simulate human agents in many control situations. Numerous authors have noted that this methodology has advantages for a variety of tasks within the realm of computer music. In this paper, a review of such projects is conducted and a rudimentary example application of fuzzy logic techniques is presented. This automatically achieves a basic level of 'humanisation' of a drum pattern through strike velocity modification. Such a tool could significantly reduce the time spent on editing individual drum hits in a music production environment and has potential applications for rhythmic composition and performance.
Download GPU-Based Spectral Model Synthesis for Real-Time Sound Rendering The timbre of an instrument is usually represented by sinusoids plus noise. Spectral modeling synthesis (SMS) is an audio synthesis technique which can create musical timbre and give control over the frequency and amplitude. Additive synthesis and LPC synthesis are usually applied for synthesizing sinusoids and residuals, respectively. However, it takes fairly large computing power while implementing the algorithms. The purpose of this paper is to present GPU-based techniques of implementing SMS for real-time audio processing by using parallelism and programmability in graphics pipeline. The performance is compared to CPU-based implementations.
Download Virtual Acoustic Recording: An Interactive Approach In this paper, we present a framework for recording real musical auditory scenes for interactive virtual acoustic reproduction over headphones. The framework considers the parameterization of real-world soundfields and subsequent real-time auralization using a hybrid image source method/measurement-based auralization approach. First Order (FOA) and Higher Order (HOA) Ambisonics are utilized together in a single system to provide an optimized and psychoacoustically justified framework.
Download Statistical Spectral Envelope Transformation applied to Emotional Speech Transformation of sound by statistical techniques is a promising method for a new range of digital audio effects. In this paper a data driven voice transformation algorithm is used to alter the timbre of a neutral (non-emotional) voice in order to reproduce a particular emotional vocal timbre. Perceptually based Mel-Cepstral analysis and Mel Log Spectral Approximation digital filter are used to represent the speech timbre and to synthesize speech with modified spectral envelope. The transformation function adopts a GMM (Gaussian Mixture Model) based parametrization in order convert the spectral envelopes. Experiments with the first and second order derivatives of the mel-cepstral coefficients have been undertaken to prove the benefit of including dynamic information in the model. The proposed algorithm has been evaluated by means of objective measures in the neutral-to-happy and neutral-to-sad tasks.
Download Analysis / Synthesis of Rolling Sounds Using a Source Filter Approach In this paper, the analysis and synthesis of a rolling ball sound is proposed. The approach is based on the assumption that the rolling sound is generated by a concatenation of micro-impacts between a ball and a surface, each having associated resonances. Contact timing information is first extracted from the rolling sound using an onset detection process. The resulting individual contact segments are subband filtered before being analyzed using linear predictive coding (LPC) and notch filter parameter estimation. The segments are then resynthesized and overlap-added to form a complete rolling sound. This approach is similar to that of [1], though the methods used for contact event detection and filter parameter estimation are completely different.