Download Extraction and Removal of Percussive Sounds from Musical Recordings Automated removal and extraction (isolation) of percussive sounds embedded in an audio signal is useful for a variety of applications such as speech enhancement and for music processing effects. A novel method is presented to accomplish both extraction and removal of beats, using an adaptive filter based on the LMS algorithm. Empirical evaluation is undertaken using computer generated music with a mix of natural voice and repeating drum, and shows that the efficacy of the system is robust to different sound processing techniques such as non-linear distortion and tempo jitter.
Download Representations of Audio Signals in Overcomplete Dictionaries: What is the Link Between Redundancy Factor and Coding Properties? This paper addresses the link between the size of the dictionary in overcomplete decompositions of signals and the rate-distortion properties when such decompositions are used for audio coding. We have performed several experiments with sets of nested dictionaries showing that very redundant shift-invariant and multi-scale dictionaries have a clear benefit at low bit-rates ; however for very low distortion a lot of atoms have to be encoded, in these cases orthogonal transforms such as the MDCT give better results.
Download A Spatial Interface for Audio and Music Production In an effort to find a better suited interface for musical performance, a novel approach has been discovered and developed. At the heart of this approach is the concept of physical interaction with sound in space, where sound processing occurs at various 3D locations and sending sound signals from one area to another is based on physical models of sound propagation. The control is based on a gestural vocabulary that is familiar to users, involving natural spatial interaction such as translating, rotating, and pointing in 3-D. This research presents a framework to deal with realtime control of 3-D audio, and describes how to construct audio scenes to accomplish various musical tasks. The generality and effectiveness of this approach has enabled us to re-implement several conventional applications, with the benefit of a substantially more powerful interface, and has further led to the conceptualization of several novel applications.
Download Streaming Frequency-Domain DAFx in Csound 5 This article discusses the implementation of frequency domain digital audio effects using the Csound 5 music programming language, with its streaming frequency-domain signal (fsig) framework. Introduced to Csound 4.13, by Richard Dobson, it was further extended by Victor Lazzarini in version 5. The latest release of Csound incorporates a variety of new opcodes for different types of spectral manipulations. This article introduces the fsig framework and the analysis and resynthesis unit generators. It describes in detail the different types of spectral DAFx made possible by these new opcodes.
Download Real-Time Corpus-Based Concatenative Synthesis with CataRT The concatenative real-time sound synthesis system CataRT plays grains from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics. CataRT is implemented in Max/MSP using the FTM library and an SQL database. Segmentation and MPEG-7 descriptors are loaded from SDIF files or generated on-the-fly. CataRT allows to explore the corpus interactively or via a target sequencer, to resynthesise an audio file or live input with the source sounds, or to experiment with expressive speech synthesis and gestural control.
Download Multichannel Signal Representation in PWGLSynth This paper gives an overview of one of the most important features in our synthesis language called PWGLSynth. We will concentrate on how to represent visually multichannel signals in a synthesis patch. PWGLSynth synthesis boxes support vectored inputs and outputs. This scheme is useful as it allows to construct compound entities which are used often in sound synthesis such as banks, parallel structures, serial structures, etc. PWGLSynth provides a rich set of tools that allow to manipulate vectors. For instance vectors can mixed, modulated, merged, or split into sub-vectors.
Download Using Faust for FPGA Programming In this paper we show the possibility of using FAUST (a programming language for function based block oriented programming) to create a fast audio processor in a single chip FPGA environment. The produced VHDL code is embedded in the on-chip processor system and utilizes the FPGA fabric for parallel processing. For the purpose of implementing and testing the code a complete System-On-Chip framework has been created. We use a Digilent board with a XILINX Virtex 2 Pro FPGA. The chip has a PowerPC 405 core and the framework uses the on chip peripheral bus to interface the core. The content of this paper presents a proof-of-concept implementation using a simple two pole IIR filter. The produced code is working, although more work has to be done for implementing complex arithmetic operations support.
Download Parametric Coding of Stereo Audio Based on Principal Component Analysis Low bit rate parametric coding of multichannel audio is mainly based on Binaural Cue Coding (BCC). Another multichannel audio processing method called upmix can also be used to deliver multichannel audio, typically 5.1 signals, at low data rates. More precisely, we focus on existing upmix method based on Principal Component Analysis (PCA). This PCA-based upmix method aims at blindly create a realistic multichannel output signal while BCC scheme aims at perceptually restitute the original multichannel audio signal. PCA-based upmix method and BCC scheme both use spatial parameters extracted from stereo channels to generate auditory events with correct spatial attributes i.e. sound sources positions and spatial impression. In this paper, we expose a multichannel audio model based on PCA which allows a parametric representation of multichannel audio. Considering stereo audio, signals resulting from PCA can be represented as a principal component, corresponding to directional sources, and one remaining signal, corresponding to ambience signals, which are both related to original input with PCA transformation parameters. We apply the analysis results to propose a new parametric coding method of stereo audio based on subband PCA processing. The quantization of spatial and energetic parameters is presented and then associated with a state-of-the-art monophonic coder in order to derive subjective listening test results.
Download Exact Discrete-Time Realization of a Dolby B Encoding/Decoding Architecture An algebraic technique which computes nonlinear, delay-free digital filter networks is applied to model the Dolby B in the discretetime. The model preserves the topology of the analog system, and imports the characteristics of the nonlinear processing blocks which are responsible of the peculiar functioning of Dolby B. The resulting numerical system exhibits qualitatively similar dynamic behavior and performance – full compliance with the Dolby B specifications would be achieved by deriving, from comprehensive data sheets of the system, accurate discrete-time models of the analog processing blocks. Results demonstrate that the computation converges if proper iterative methods are employed.
Download Real-Time Bayesian GSM Buzz Removal In this paper we propose an iterative audio restoration algorithm based on an autoregressive (AR) model with modeling of the noise pulse template to detect and restore Cell-phone electromagnetic interference (EMI) patterns known as “GSM buzz”. The algorithm is purely software based and does not require the aid of any hardware providing side information. The only assumption is that individual pulses are similar to scaled versions of the known template. With this assumption, the algorithm can fully detect and restore noisy interference signals in real time with almost no audible artifacts and improve the signal to noise ratio by as much as 50dB.