Download On Restoring Prematurely Truncated Sine Sweep Room Impulse Response Measurements
When measuring room impulse responses using swept sinusoids, it often occurs that the sine sweep room response recording is terminated soon after either the sine sweep ends or the long-lasting low-frequency modes fully decay. In the presence of typical acoustic background noise levels, perceivable artifacts can emerge from the process of converting such a prematurely truncated sweep response into an impulse response. In particular, a low-pass noise process with a time-varying cutoff frequency will appear in the measured room impulse response, a result of the frequency-dependent time shift applied to the sweep response to form the impulse response. Here, we detail the artifact, describe methods for restoring the impulse response measurement, and present a case study using measurements from the Berkeley Art Museum shortly before its demolition. We show that while the difficulty may be avoided using circular convolution, nonlinearities typical of loudspeakers will corrupt the room impulse response. This problem can be alleviated by stitching synthesized noise onto the end of the sweep response before converting it into an impulse response. Two noise synthesis methods are described: the first uses a filter bank to estimate the frequency-dependent measurement noise power and then filter synthesized white Gaussian noise. The second uses a linearphase filter formed by smoothing the recorded noise across perceptual bands to filter Gaussian noise. In both cases, we demonstrate that by time-extending the recording with noise similar to the recorded background noise that we can push the problem out in time such that it no longer interferes with the measured room impulse response.
Download Constrained Pole Optimization for Modal Reverberation
The problem of designing a modal reverberator to match a measured room impulse response is considered. The modal reverberator architecture expresses a room impulse response as a parallel combination of resonant filters, with the pole locations determined by the room resonances and decay rates, and the zeros by the source and listener positions. Our method first estimates the pole positions in a frequency-domain process involving a series of constrained pole position optimizations in overlapping frequency bands. With the pole locations in hand, the zeros are fit to the measured impulse response using least squares. Example optimizations for a mediumsized room show a good match between the measured and modeled room responses.
Download Diffuse-field Equalisation of First-order Ambisonics
Timbre is a crucial element of believable and natural binaural synthesis. This paper presents a method for diffuse-field equalisation of first-order Ambisonic binaural rendering, aiming to address the timbral disparity that exists between Ambisonic rendering and head related transfer function (HRTF) convolution, as well as between different Ambisonic loudspeaker configurations. The presented work is then evaluated through listening tests, and results indicate diffuse-field equalisation is effective in improving timbral consistency.
Download Improving Elevation Perception with a Tool for Image-guided Head-related Transfer Function Selection
This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject’s pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection.
Download Velvet Noise Decorrelator
Decorrelation of audio signals is an important process in the spatial reproduction of sounds. For instance, a mono signal that is spread on multiple loudspeakers should be decorrelated for each channel to avoid undesirable comb-filtering artifacts. The process of decorrelating the signal itself is a compromise aiming to reduce the correlation as much as possible while minimizing both the sound coloration and the computing cost. A popular decorrelation method, convolving a sound signal with a short sequence of exponentially decaying white noise which, however, requires the use of the FFT for fast convolution and may cause some latency. Here we propose a decorrelator based on a sparse random sequence called velvet noise, which achieves comparable results without latency and at a smaller computing cost. A segmented temporal decay envelope can also be implemented for further optimizations. Using the proposed method, we found that a decorrelation filter, of similar perceptual attributes to white noise, could be implemented using 87% less operations. Informal listening tests suggest that the resulting decorrelation filter performs comparably to an equivalent white-noise filter.
Download Parametric Acoustic Camera for Real-time Sound Capture, Analysis and Tracking
This paper details a software implementation of an acoustic camera, which utilises a spherical microphone array and a spherical camera. The software builds on the Cross Pattern Coherence (CroPaC) spatial filter, which has been shown to be effective in reverberant and noisy sound field conditions. It is based on determining the cross spectrum between two coincident beamformers. The technique is exploited in this work to capture and analyse sound scenes by estimating a probability-like parameter of sounds appearing at specific locations. Current techniques that utilise conventional beamformers perform poorly in reverberant and noisy conditions, due to the side-lobes of the beams used for the powermap. In this work we propose an additional algorithm to suppress side-lobes based on the product of multiple CroPaC beams. A Virtual Studio Technology (VST) plug-in has been developed for both the transformation of the time-domain microphone signals into the spherical harmonic domain and the main acoustic camera software; both of which can be downloaded from the companion web-page.
Download Estimating Pickup and Plucking Positions of Guitar Tones and Chords with Audio Effects
In this paper, we introduce an approach to estimate the pickup position and plucking point on an electric guitar for both single notes and chords recorded through an effects chain. We evaluate the accuracy of the method on direct input signals along with 7 different combinations of guitar amplifier, effects, loudspeaker cabinet and microphone. The autocorrelation of the spectral peaks of the electric guitar signal is calculated and the two minima that correspond to the locations of the pickup and plucking event are detected. In order to model the frequency response of the effects chain, we flatten the spectrum using polynomial regression. The errors decrease after applying the spectral flattening method. The median absolute error for each preset ranges from 2.10 mm to 7.26 mm for pickup position and 2.91 mm to 21.72 mm for plucking position estimates. For strummed chords, faster strums are more difficult to estimate but still yield accurate results, where the median absolute errors for pickup position estimates are less than 10 mm.
Download Unsupervised Taxonomy of Sound Effects
Sound effect libraries are commonly used by sound designers in a range of industries. Taxonomies exist for the classification of sounds into groups based on subjective similarity, sound source or common environmental context. However, these taxonomies are not standardised, and no taxonomy based purely on the sonic properties of audio exists. We present a method using feature selection, unsupervised learning and hierarchical clustering to develop an unsupervised taxonomy of sound effects based entirely on the sonic properties of the audio within a sound effect library. The unsupervised taxonomy is then related back to the perceived meaning of the relevant audio features.
Download The Mix Evaluation Dataset
Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.
Download The Snail: A Real-time Software Application to Visualize Sounds
The Snail is a real-time software application that offers possibilities for visualizing sounds and music, for tuning musical instruments, for working on pitch intonation, etc. It incorporates an original spectral analysis technology (patent-pending) combined with a display on a spiral representation: the center corresponds to the lowest frequencies, the outside to the highest frequencies, and each turn corresponds to one octave so that tones are organized with respect to angles. The spectrum magnitude is displayed according to perceptive features, in a redundant way: the loudness is mapped to both the line thickness and its brightness. However, because of the time-frequency uncertainty principle, using the Fourier spectrum (or also Q-transform, wavelets, etc) does not lead to a sufficient accuracy to be used in a musical context. The spectral analysis is completed by frequency precision enhancer based on a postprocessing of the demodulated phase of the spectrum. This paper presents the scientific principles, some technical aspects of the software development and the main display modes with examples of use cases.