Download Analysis and resynthesis of quasi-harmonic sounds: an iterative filterbank approach We employ a hybrid state-space sinusoidal model for general use in analysis-synthesis based audio transformations. This model, which has appeared previously in altered forms (e.g. [5], [8], perhaps others) combines the advantages of a source-filter model with the flexible, time-frequency based transformations of the sinusoidal model. For this paper, we specialize the parameter identification task to a class of “quasi-harmonic” sounds. The latter represent a variety of acoustic sources in which multiple, closely spaced modes cluster about principal harmonics loosely following a harmonic structure (some inharmonicity is allowed.) To estimate the sinusoidal parameters, an iterative filterbank splits the signal into subbands, one per principal harmonic. Each filter is optimally designed by a linear programming approach to be concave in the passband, monotonic in transition regions, and to specifically null out sinusoids in other subband regions. Within each subband, the constant frequencies and exponential decay rates of each mode are estimated by a Steiglitz-McBride approach, then time-varying amplitudes and phases are tracked by a Kalman filter. The instantaneous phase estimate is used to derive an average instantaneous frequency estimate; the latter averaged over all modes in the subband region updates the filter’s center frequency for the next iteration. In this way, the filterbank structure progressively adapts to the specific inharmonicity structure of the source recording. Analysissynthesis applications are demonstrated with standard (time/pitchscaling) transformation protocols, as well as some possibly novel effects facilitated by the “source-filter” aspect.
Download The caterpillar system for data-driven concatenative sound synthesis Concatenative data-driven synthesis methods are gaining more interest for musical sound synthesis and effects. They are based on a large database of sounds and a unit selection algorithm which finds the units that match best a given sequence of target units. We describe related work and our C ATERPILLAR synthesis system, focusing on recent new developments: the advantages of the addition of a relational SQL database, work on segmentation by alignment, the reformulation and extension of the unit selection algorithm using a constraint resolution approach, and new applications for musical and speech synthesis.
Download Enhanced partial tracking using linear prediction In this paper, we introduce a new partial tracking method suitable for the sinusoidal modeling of mixtures of instrumental sounds with pseudo-stationary frequencies. This method, based on the linear prediction of the frequency evolutions of the partials, enables us to track these partials more accurately at the analysis stage, even in complex sound mixtures. This allows our spectral model to better handle polyphonic sounds.
Download Direct estimation of frequency from MDCT-encoded files The Modified Discrete Cosine Transform (MDCT) is a broadlyused transform for audio coding, since it allows an orthogonal time-frequency transform without blocking effects. In this article, we show that the MDCT can also be used as an analysis tool. This is illustrated by extracting the frequency of a pure sine wave with some simple combinations of MDCT coefficients. We studied the performance of this estimation in ideal (noiseless) conditions, as well as the influence of additive noise (white noise / quantization noise). This forms the basis of a low-level feature extraction directly in the compressed domain.
Download On sinusoidal parameter estimation This paper contains a review of the issues surrounding sinusoidal parameter estimation which is a vital part of many audio manipulation algorithms. A number of algorithms which use the phase of the Fourier transform for estimation (e.g. [1]) are explored and shown to be identical. Their performance against a classical interpolation estimator [2] and comparison with the Cramer Rao Bound (CRB) is presented. Component detection is also considered and various methods of improving these algorithms are discussed.
Download Wave field synthesis interaction with the listening environment, improvements in the reproduction of virtual sources situated inside the listening room Holophonic sound reproduction using Wave Field Synthesis (WFS) [1] aims at recreating a virtual spatialized sound scene in an extended area. Applying this technique to synthesize virtual sources located within an indoor environment can create striking audio effects in the context of virtual or augmented reality applications. However, interactions of the synthesized sound field with the listening room must be taken into account for they cause modifications in the resulting sound field. This paper enumerates some of these interactions according to different virtual scene configurations and applications. Particular attention is paid to the reproduction of the sound source directivity and to the reproduction of a room effect coherent with the real environment. Solutions for synthesizing the directivity of the source and the associated room effect are proposed and discussed around simulations, developpements and first perceptual validation.
Download Monitoring distance effect with wave field synthesis Wave Field Synthesis (WFS) [1] rendering allows the reproduction of virtual point sources. Depending on source positioning, the wave front synthesized in the listening area exhibits a given curvature that is responsible for a spatial perspective sensation. It is then possible to monitor the distance of a source with a “holophonic distance” parameter concurrently with conventional distance cues based on the control of direct/reverberation ratio. Presentation of this holophonic distance is made and then discussed in the context of authoring sound scenes in WFS installations. Three goals to this work: Introducing WFS to sound engineers in an active listening test where they manipulate different parameters for the construction of a sound scene. Assessing the perceptual relevance of the holophonic distance modifications. Studying the possible link between the holophonic distance parameter and conventional subjective distance parameters traditionally used by sound engineers.
Download An expressive real-time sound model of rolling This paper describes the structure and potential of a real-time sound model of “rolling”. The work has it’s background and motivation in the ecological approach of psychoacoustics. Scope of interest is the efficient and clear (possibly exaggerated) acoustic expression, cartoonification, of certain ecological attributes rather than realistic simulations for their own sake. To this end, different techniques of sound generation are combined in a hybrid hierarchical structure. A physics-based algorithm (section 2) of impact-interaction at the audio-core is surrounded by higher-level structures that explicitely model macroscopic characteristics (section 5). Another connecting audio-level algorithm, the “rolling-filter”, reduces the (3-dimensional) geometry of the rolling-contact to the one dimension of the impactinteraction-model (section 3).
Download EGOSOUND, an egocentric, interactive and real-time approach of sound space Developed in the context of virtual environments linked to artistic creation, Egosound was conceived to visualize multiphonic sound space in an egocentric way. Ambisonic algorithms and a graphic editor for trajectories are implemented. An example of artistic project with Egosound is exposed. Parts of the software engine of Egosound are now available for download as external objects for Pure Data and Max/Msp graphical programming environments.
Download Musically expressive sound textures from generalized audio We present a method of musically expressive synthesis-by-analysis that takes advantage of recent advancements in auditory scene analysis and sound separation algorithms. Our model represents incoming audio as a sub-conceptual model using statistical decorrelation techniques that abstract away individual auditory events, leaving only the gross parameters of the sound– the “eigensound” or generalized spectral template. Using these approaches we present various optimization guidelines and musical enhancements, specifically with regards to the beat and temporal nature of the sounds, with an eye towards real-time effects and synthesis. Our model results in completely novel and pleasing sound textures that can be varied with parameter tuning of the “unmixing” weight matrix.