Augmented reality audio (ARA) means mixing the natural sound environment with artificially created sound scenes. This requires that the perception of natural environment has to be preserved as well as possible, unless some modification to it is desired. A basic ARA headset consists of binaural microphones, an amplifier/mixer, and earphones feeding sound to the ear canals. All these components more or less change the perceived sound scene. In this paper we describe an ARA headset, equalization of its response, and particularly the results of a usability study. The usability was tested by subjects wearing the headset for relatively long periods in different environments of their everyday-life conditions. The goal was to find out what works well and what are the problems in lengthened use. It was found that acoustically the headset worked fine in most occasions when equalized individually or generically (averaged over several subjects). The main problems of usage were related to handling inconveniences and special environments.
This study explores the potential of utilising certain prosodic qualities of function-specific vocal expressions in order to design effective non-speech user interface sounds. In an empirical setting, utterances with four context-situated communicative functions were produced by 20 participants. Time series of fundamental frequency (F0 ) and intensity were extracted from the utterances and analysed statistically. The results show that individual communicative functions have distinct prosodic characteristics that can be statistically modelled. By using the model, certain function-specific prosodic cues can be identified and, in turn, imitated in the design of communicative interface sounds for the corresponding communicative functions in human-computer interaction.
In most practical applications, for the sake of information integrity not only it is useful to detect whether a multimedia content has been modified or not, but also to identify which kind of attack has been carried out. In the case of audio streams, for example, it may be useful to localize the tamper in the time and/or frequency domain. In this paper we devise a hash-based tampering detection and localization system exploiting compressive sensing principles. The multimedia content provider produces a small hash signature using a limited number of random projections of a time-frequency representation of the original audio stream. At the content user side, the hash signature is used to estimate the distortion between the original and the received stream and, provided that the tamper is sufficiently sparse or sparsifiable in some orthonormal basis expansion or redundant dictionary (e.g. DCT or wavelet), to identify the time-frequency portion of the stream that has been manipulated. In order to keep the hash length small, the algorithm exploits distributed source coding techniques.
Dispersion is a physical phenomenon that makes sound waves more or less inharmonic. Most physical sound synthesis models consider dispersion as a constant property that does not change during the course of a musical event. However, these models would be more expressive without such a restriction. This paper describes a dispersion amount parameter for precise control over inharmonicity, and then experiments with control and audio rate modulation of that parameter. In this research we found that inharmonicity of a plucked string could be smoothly controlled in real-time, and that novel sonic material could be synthesized when the modulation rate was raised into audio range. Instability of the string model with certain parameter values was considered to be problematic.
This paper presents a sustain-pedal effect simulation algorithm for piano synthesis, by using parallel second-order filters. A robust two-step filter design procedure, based on frequency-zooming ARMA modeling and least squares fit, is applied to calibrate the algorithm from impulse responses of the soundboard and the string register. The model takes into account the differences in coupling between the various strings. The algorithm can be applied to both sample-based and physics-based piano synthesizers.
This paper discusses compact-stencil nite difference time domain (FDTD) schemes for approximating the 2D wave equation in the context of digital audio. Stability, accuracy, and efciency are investigated and new ways of viewing and interpreting the results are discussed. It is shown that if a tight accuracy constraint is applied, implicit schemes outperform explicit schemes. The paper also discusses the relevance to digital waveguide mesh modelling, and highlights the optimally efcient explicit scheme.
In this paper the problem of the synthesis of plucked strings by means of physically inspired models is reconsidered in the context of the player’s interaction with the virtual instrument. While solutions for the synthesis of guitar tones have been proposed, which are excellent from the acoustic point of view, the problem of the control of the physical parameters directly by the player has not received sufficient attention. In this paper we revive a simple model previously presented by Cuzzucoli and Lombardo for the player’s touch. We show that the model is affected by an inconsistency that can be removed by introducing the finger/pick perturbation in a balanced form on the digital waveguide. The results, together with a more comprehensive model of the guitar have been implemented in a VST plugin, which is the starting point for further research.
Real-time bidirectional audio applications, like microphones and monitor speakers in live performances, typically require communication systems with minimum latency. When digital transmission with limited bit rate is desired, this poses tight constraints on the algorithmic delay of the audio coding scheme. We present a delay-free approach employing adaptive differential pulse code modulation (ADPCM) and adaptive spectral shaping of the coding noise. To achieve zero-delay operation, both prediction and quantization logic of the ADPCM structure are realized in a backwardadaptive fashion. Noise shaping is accomplished via two feedback loops around the quantizer for efficient exploitation of the auditory selectivity and masking phenomena, respectively. Due to automatic optimization of the involved parameters, the performance of the proposed system is on par with that of prior low-delay approaches.
Nonlinear effects in ultrasound propagation can be used for generating highly directive audible sound. In order to do so, we can modulate the amplitude of the audio signal and send it to an ultrasound transducer. When played back at a sufficiently high sound pressure level, due to a nonlinear behavior of the medium, the ultrasonic signal gets self-demodulated. The resulting signal has two important characteristics: that of becoming audible; and that of having the same directivity properties of the ultrasonic carrier frequency. In this paper we describe the theoretical advantages of singlesideband (SSB) modulation versus a standard amplitude modulation (AM) scheme for the above-described application. We describe our near-field soundfield measuring experiments, and propose steering solutions for the array using two different types of transducers, piezoelectric or electrostatic, and the proper supporting hardware.
This article provides an overview of further methods for producing hybrid natural-synthetic spectra with adaptive frequency modulation (AdFM). It focuses on three different techniques for the generation of asymmetric spectra based on single-sideband FM, asymmetric FM and Split-sideband synthesis. The first two techniques are applied to the variable delay line implementation of AdFM, whereas the third is based on an extension of the heterodyne method. The article discusses the principles involved in each synthesis technique in good detail, providing one reference implementation for each. A number of examples are discussed, demonstrating the possibilities for a variety of digital audio effects applications.