Download Multimodal Interfaces for Expressive Sound Control This paper introduces research issues on multimodal interaction and interfaces for expressive sound control. We introduce Multisensory Integrated Expressive Environments (MIEEs) as a framework for Mixed Reality applications in the performing arts. Paradigmatic contexts for applications of MIEEs are multimedia concerts, interactive dance / music / video installations, interactive museum exhibitions, distributed cooperative environments for theatre and artistic expression. MIEEs are user-centred systems able to interpret the high-level information conveyed by performers through their expressive gestures and to establish an effective multisensory experience taking into account expressive, emotional, affective content. The lecture discusses some main issues for MIEEs and presents the EyesWeb (www.eyesweb.org) open software platform which has been recently redesigned (version 4) in order to better address MIEE requirements. Short live demonstrations are also presented.
Download The Sounding Gesture: An Overview Sound control by gesture is a peculiar topic in Human-Computer Interaction: many different approaches to it are available, focusing each time on diversified perspectives. Our point of view is an interdisciplinary one: taking into account technical considerations about control theory and sound processing, we try to explore the expressiveness world which is closer to psychology theories. Starting from a state of the art which outlines two main approaches to the problem of ”making sound with gestures”, we will delve into psychological theories about expressiveness, describing in particular possible applications dealing with intermodality and mixed reality environments related to the Gestalt Theory. HCI design can indeed benefit from this kind of approach because of the quantitative methods that can be applied to measure expressiveness. Interfaces can be used in order to convey expressiveness, which is a plus of information that can help interacting with the machine; this kind of information can be coded as spatio-temporal schemes, as it is stated in Gestalt theory.
Download Effect of Latency on Playing Accuracy of Two Gesture Controlled Continuous Sound Instruments Without Tactile Feedback The paper reports results from an experimental study quantifying how latency affects the playing accuracy of two continuous sound instruments. 11 subjects played a conventional Theremin and a virtual reality Theremin. Both instruments provided the user only audio feedback. The subjects performed two tasks under different instrument latencies. They attempted to match the pitch of the instrument to a sample pitch and they played along a short sample melody and a metronome. Both the sample sound and the instrument’s sound were recorded on different channels of a sound file. Later the pitch of the sounds was extracted and user performance analyzed. The results show that the time required to match a given pitch degrades about five times the introduced latency suggesting that the feedback latency cumulates over the whole task. Errors while playing along a sample melody increased 80% by average on the highest latency of 240ms. Latencies until 120ms increased the errors only slightly.
Download Audio-Based Gesture Extraction on the ESITAR Controller Using sensors to extract gestural information for control parameters of digital audio effects is common practice. There has also been research using machine learning techniques to classify specific gestures based on audio feature analysis. In this paper, we will describe our experiments in training a computer to map the appropriate audio-based features to look like sensor data, in order to potentially eliminate the need for sensors. Specifically, we will show our experiments using the ESitar, a digitally enhanced sensor based controller modeled after the traditional North Indian sitar. We utilize multivariate linear regression to map continuous audio features to continuous gestural data.
Download Sparse and Structured Decompositions of Audio Signals in Overcomplete Spaces We investigate the notion of “sparse decompositions” of audio signals in overcomplete spaces, ie when the number of basis functions is greater than the number of signal samples. We show that, with a low degree of overcompleteness (typically 2 or 3 times), it is possible to get good approximation of the signal that are sparse, provided that some “structural” information is taken into account, ie the localization of significant coefficients that appears to form clusters. This is illustrated with decompositions on a union of local cosines (MDCT) and discrete wavelets (DWT), that are shown to perform well on percussive signals, a class of signals that is difficult to sparsely represent on pure (local) Fourier bases. Finally, the obtained clusters of individuals atoms are shown to carry higher levels of information, such as a parametrization of partials or attacks, and this is potentially useful in an information retrieval context.
Download Detection of Clicks Using Sinusoidal Modeling for the Confirmation of the Clicks This article presents methods for clicks detection in degraded audio recordings. It begins with a brief description of the method implemented in first instance for the detection of clicks in audio sources based on linear prediction. Looking for an improvement of the results obtained with this method, we propose a method based on sinusoidal modeling for the confirmation of the clicks. This method discards clicks that were wrongly detected. This allows the detection of clicks of small amplitude avoiding wrong detections. The results obtained by this method are shown, confirming the good operation. Finally, the method implemented for detection of clicks in naturally degraded audio sources is presented.
Download Audio Analysis, Visualization, and Transformation with the Matching Pursuit Algorithm The matching pursuit (or MP) algorithm decomposes audio data into a collection of thousands of constituent sound particles or gaborets. These particles correspond to the “quantum” or granular model of sound posited by Dennis Gabor. This robust and highresolution analysis technique creates new possibilities for sound visualization and transformation. This paper presents an account of a first round of experiments with MP-based visualization and transformation techniques.
Download Removing Crackle from an LP Record via Wavelet Analysis The familiar “crackling” is one of the undesirable phenomena which we deal with in an LP record. Wavelet analysis brings a new alternative approach to the removal of this feature in the restoration process of the recording. In the paper, the principle of this method is described. A theoretical discussion of how the selection of the wavelet basis affects the quality of the restoration is also included.
Download Spectral Delays with Frequency Domain Processing In this paper the author presents preliminary research undertaken on spectral delays using frequency domain processing. A Max/MSP patch is presented in which it is possible to delay individual bins of a Fourier transform and several musically interesting applications of the patch, including the ability to create distinct spatial images and spectral trajectories are outlined.
Download Exponential Weighting Method for Sample-by-Sample Update of Warped AR-Model Auto-regressive (AR) modeling is a powerful tool having many ap plications in audio signal processing. The modeling procedure can be focused to low or high frequency range using frequency warp ing. Conventionally the AR-modeling procedure is accomplished with frame-by-frame processing which introduces latency. As with any frame-by-frame algorithm full frame has to be available for the algorithm before any output can be produced. This latency makes AR-modeling more or less unusable in real-time sound effects es pecially when long frame lengths are required. In this paper we introduce an exponential weighting (EW) method for sample-bysample update of the warped AR-model. This method reduces the latency down to the order of the AR-model.