Download User-Guided Variable-Rate Time-Stretching Via Stiffness Control User control over variable-rate time-stretching typically requires direct, manual adjustment of the time-dependent stretch rate. For time-stretching with transient preservation, rhythmic warping, rhythmic emphasis modification, or other effects that require additional timing constraints, however, direct manipulation is difficult. For a more user-friendly approach, we present work that allows a user to specify a time-dependent stiffness curve to warp the time axis of a recording, while maintaining other timing constraints, such as a desired overall recording length or musical rhythm quantization (e.g. straight-to-swing), providing a notion of stretchability to sound. To do so, the user-guided stiffness curve and timing constraints are translated into the desired time-dependent stretch rate via a constrained optimization program motivated by a physical spring system. Once the time-dependent stretch rate is computed, appropriately modified variable-rate time-stretch processors are used to process the sound. Initial results are demonstrated using both a phase-vocoder and pitch-synchronous overlap-add processor.
Download Improved PVSOLA Time Stretching and Pitch Shifting for Polyphonic Audio An advanced phase vocoder technique for high quality audio pitch shifting and time stretching is described. Its main concept is based on the PVSOLA time stretching algorithm which is already known to give good results on monophonic speech. Some enhancements are proposed to add the ability to process polyphonic material at equal quality by distinguishing between sinusoidal and noisy frequency components. Furthermore, the latency is reduced to get closer to a real time implementation. The new algorithm is embedded into a flexible pitch shifting and time stretching framework by adding transient detection and resampling. A subjective listening test is used to evaluate the new algorithm and to verify the improvements.
Download On Stretching Gaussian Noises with the Phase Vocoder Recently, the processing of non-sinusoidal signals, or sound textures, has become an important topic in various areas. In general, the transformation is done by the phase vocoder techniques. Since the phase vocoder technique is based on a sinusoidal model, it’s performance is not satisfying when applied to transform sound textures. The following article investigates into the problem using as example the most basic non-sinusoidal sounds, that are noise signals. We demonstrate the problems that arise when time stretching noise with the phase vocoder, provide a description of some relevant statistical properties of the time frequency representation of noise and introduce an algorithm that allows to preserve these statistical properties when time stretching noise with the phase vocoder. The resulting algorithm significantly improves the perceptual quality of the time stretched noise signals and therefore it is seen as a promising first step towards an algorithm for transformation of sound textures.
Download Real-time Auralisation System for Virtual Microphone Positioning A computer application was developed to simulate the process of microphone positioning in sound recording applications. A dense, regular grid of impulse responses pre-recorded on the region of the room under study allowed the sound captured by a virtual microphone to be auralised through real-time convolution with an anechoic stream representing the sound source. Convolution was performed using a block-based variation on the overlap-add method where the summation of many small subconvolutions produced each block of output data samples. As the applied RIR filter varied on successive audio output blocks, a short cross fade was applied to avoid glitches in the audio. The maximum possible length of impulse response applied was governed by the size of audio processing block (hence latency) employed by the program. Larger blocks allowed a lower processing time per sample. At 23.2ms latency (1024 samples at 44.1kHz), it was possible to apply 9 second impulse responses on a standard laptop computer.
Download Spatial High Frequency Extrapolation Method for Room Acoustic Auralization Auralization of numerically modeled impulse responses can be informative when assessing the geometric characteristics of a room. Wave-based acoustic modeling methods are suitable for approximating low frequency wave propagation. Subsequent auralizations are perceived unnaturally due to the limited bandwidth involved. The paper presents a post-processing framework for extending low-mid frequency band limited spatial room impulse responses (SRIR) to include higher frequency signal components without the use of geometric modeling methods. Acoustic parameters for extrapolated RIRs are compared with reference measurement data for existing venues and a Finite Difference Time Domain modeled SRIR is extrapolated to produce a natural sounding full-band SRIR signal. The method shows promising agreement particularly for large venues as the air absorption is more dominant than the boundary absorption at high frequencies.
Download 3D Binaural Audio Capture and Reproduction Using A Miniature Microphone Array This paper presents a new low-cost and efficient approach for the real-time three-dimensional (3D) binaural audio capture and reproduction via headphones using a miniature microphone array. The microphone array is configured in B-format to minimize space requirement using an omnidirectional microphone and three bidirectional microphones. The signals captured by the microphone array are applied by a set of optimal time-invariant gain vectors, which converts a B-format Ambisonic sinal into binaural signal for headphone reproduction. The optimal time-invariant gain vectors that are computed offline integrate the two stages of beamforming and head related transfer function (HRTF) filtering. As an alternative to the virtual speaker method, the proposed beamforming approach is independent of the number of virtual audio sources and flexible for working on different sets of HRTFs. A real-time system has been implemented based on the proposed method. Psychophysical hearing tests show good localization accuracy.
Download Variable Source Radiation Pattern Synthesis for use in Two-Dimensional Sound Reproduction In this paper the authors present an approach for two-dimensional sound reproduction using a circular layout of speakers where the gains are obtained from a variable polar pattern. The method presented here has the ability to be variable-order whilst keeping the same key features of a base polar pattern. Comparisons are drawn between the new approach and a previous approach by the authors using variable-order, variable-decoder Ambisonics. The new method is found to not be as directional as the Ambisonics approach, yet it maintains the base polar pattern unlike with Ambisonics. Whilst both approaches have two variable parameters the new approach’s parameters are independent and are therefore intuitive to an end user using such a tool as a spatialisation effect as well as technique.
Download Energy-based calibration of virtual performance systems A Virtual Performance System (VPS) is a real-time 3D auralisation system which allows a musician to play in simulated acoustic environments. Such systems have been used to investigate the effect of stage acoustics on the performance technique of musicians. This article describes the process of calibrating a VPS using energy-based quantities and goes on to verify this technique by comparing known acoustic quantities measured in a test space with a virtual version of the same space. This work has demonstrated that calibrating a VPS using metrics based on Support will result in an accurate simulation of a test space according to known acoustic metrics such as T30. A comparison of quantities referring to earlier parts of the response, such as Early Decay Time (EDT), show some errors which are thought to be caused by the non-anechoic nature of the reproduction space.
Download Parametric Spatial Audio Effects Parametric spatial audio coding methods aim to represent efficiently spatial information of recordings with psychoacoustically relevant parameters. In this study, it is presented how these parameters can be manipulated in various ways to achieve a series of spatial audio effects that modify the spatial distribution of a captured or synthesised sound scene, or alter the relation of its diffuse and directional content. Furthermore, it is discussed how the same representation can be used for spatial synthesis of complex sound sources and scenes. Finally, it is argued that the parametric description provides an efficient and natural way for designing spatial effects.
Download Binaural In-Ear Monitoring of Acoustic Instruments in Live Music Performance A method for Binaural In-Ear Monitoring (Binaural IEM) of acoustic instruments in live music is presented. Spatial rendering is based on four considerations: the directional radiation patterns of musical instruments, room acoustics, binaural synthesis with Head-Related Transfer Functions (HRTF), and the movements of both the musician’s head and instrument. The concepts of static and dynamic sound mixes are presented and discussed according to the emotional involvement and musical instruments of the performers, as well as the use of motion capture technology. Pilot experiments of BIEM with dynamic mixing were done with amateur musicians performing with wireless headphones and a motion capture system in a small room. Listening tests with professional musicians evaluating recordings under conditions of dynamic sound mixing were carried out, attempting to find an initial reaction to BIEM. Ideas for further research in static sound mixing, individualized HRTFs, tracking techniques, as well as wedge-monitoring schemes are suggested.