Download Estimating Parameters from Audio for an EG+LFO Model of Pitch Envelopes
Envelope generator (EG) and Low Frequency Oscillator (LFO) parameters give a compact representation of audio pitch envelopes. By estimating these parameters from audio per-note, they could be used as part of an audio coding scheme. Recordings of various instruments and articulations were examined, and pitch envelopes found. Using an evolutionary algorithm, EG and LFO parameters for the envelopes were estimated. The resulting estimated envelopes are compared to both the original envelope, and to a fixedpitch estimate. Envelopes estimated using EG+LFO can closely represent the envelope from the original audio and provide a more accurate estimate than the mean pitch.
Download Audio FFT Filter Banks
FFT-based nonuniform filter banks are proposed based on channelsized inverse FFTs applied to nonuniform frequency-partitions (or overlap-add decompositions) of the Short Time Fourier Transform (STFT). Audio filter banks (particularly octave filter banks) are considered as application examples. Trade-offs discussed include perfect reconstruction, aliasing cancellation, flexibility of filterchannel band edges, use of the FFT for speed, multirate timedomain channel signals, time-varying filtering, and associated issues.
Download Novel methods in Information Management for Advanced Audio Workflows
This paper discusses architectural aspects of a software library for unified metadata management in audio processing applications. The data incorporates editorial, production, acoustical and musicological features for a variety of use cases, ranging from adaptive audio effects to alternative metadata based visualisation. Our system is designed to capture information, prescribed by modular ontology schema. This advocates the development of intelligent user interfaces and advanced media workflows in music production environments. In an effort to reach these goals, we argue for the need of modularity and interoperable semantics in representing information. We discuss the advantages of extensible Semantic Web ontologies as opposed to using specialised but disharmonious metadata formats. Concepts and techniques permitting seamless integration with existing audio production software are described in detail.
Download Melody Line Detection and Source Separation in classical Saxophone Recordings
We propose a system which separates saxophone melodies from composite recordings of saxophone, piano, and/or orchestra. The system is intended to produce an accompaniment sans saxophone suitable for rehearsal and practice purposes. A Melody Line Detection (MLD) algorithm is proposed as the starting point for a source separation implementation which incorporates known information about typical saxophone melody lines, acoustic characteristics and range of the saxophone in order to prevent and correct detection errors. By extracting reliable information about the soloist melody line, the system separates piano or orchestra accompaniments from the solo part. The system was tested with commercial recordings and a performance of 79.7% of accurate detections was achieved. The accompaniment tracks obtained after source separation successfully remove most of the saxophone sound while preserving the original nature of the accompaniment track.
Download Simplified Guitar Bridge Model for the Displacement Wave Representation in Digital Waveguides
In this paper, we present a simplified model for the string-bridge interaction in guitars or other string instruments simulated by digital waveguides. The bridge model is devised for the displacement wave representation in order to be integrated with other models for string interactions with the player and with other parts of the instrument, whose simulation and implementation is easier in this representation. The model is based on a multiplierless scattering matrix representing the string-bridge interaction. Although not completely physically inspired, we show that this junction is sufficiently general to accommodate a variety of transfer functions under the sole requirement of passivity and avoids integration constants mismatch when the bridge is in turn modeled by a digital waveguide. The model is completed with simple methods to introduce horizontal and vertical polarizations of the string displacement and sympathetic vibrations of other strings. The aim of this paper is not to provide the most general methods for sound synthesis of guitar but, rather, to point at low computational cost and scalable solutions suitable for real-time implementations where the synthesizer is running together with several other audio applications.
Download Source-Filter based Clustering for Monaural Blind Source Separation
In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.
Download The Hough Transform for Binaural Source Localization
We introduce a new technique for the blind localization of several sound sources from two binaural signals. First, the binaural signals are organized as two-dimensional data where each sound source appears as a line. Second, the Hough transform is used to recognize these lines. The slopes of the lines give the mixing coefficients and directions of arrival (azimuths). Two variants of our technique are proposed, based on only one of the interaural level or time differences, respectively. Although a rapid comparison to a well-known localization method as well as promising results are shown, they are clearly not exhaustive and this paper should rather be regarded as a feasibility demonstration of the new technique.
Download Human Inspired Auditory Source Localization
This paper describes an approach for the localization of a sound source in the complete azimuth plane of an auditory scene using a movable human dummy head. A new localization approach which assumes that the sources are positioned on a circle around the listener is introduced and performs better than standard approaches for humanoid source localization like the Woodworth formula and the Freefield formula. Furthermore a localization approach based on approximated HRTFs is introduced and evaluated. Iterative variants of the algorithms enhance the localization accuracy and resolve specific localization ambiguities. In this way a localization blur of approximately three degrees is achieved which is comparable to the human localization blur. A front-back confusion allows a reliable localization of the sources in the whole azimuth plane in up to 98.43 % of the cases.
Download Impulse Response Measurement Techniques and their Applicability in the Real World
Measurement of impulse responses is a common task in audio signal processing. In this paper three common measurement techniques are reviewed: Maximum length sequences, exponentially swept sines and time delay spectrometry. The aim is to give the reader a brief tutorial of the methods with a special focus on deficiencies of the algorithms, aiding in the choice of the best algorithm for a task at hand. Additionally, for time delay spectrometry, a novel improvement is presented, lifting its restriction to relatively short impulse responses.
Download Spectrally matched Click Synthesis
We introduce Spectrally Matched Click synthesis as a novel application of FIR filter design allowing the creation of arbitrarily short duration clicks whose magnitude frequency spectra approximate those of arbitrary input sounds. We demonstrate its use on effects including incremental attack strength modification and continuous gradual “morphing” between any input sound and successively more impulsive/percussive sounds.