Download Soundspotter - A Prototype System for Content-based Audio Retrieval
We present the audio retrieval system “Soundspotter,” which allows the user to select a specific passage within an audio file and retrieve perceptually similar passages. The system extracts framebased features from the sound signal and performs pattern matching on the resulting sequences of feature vectors. Finally, an adjustable number of best matches is returned, ranked by their similarity to the reference passage. Soundspotter comprises several alternative retrieval algorithms, including dynamic time warping and trajectory matching based on a self-organizing map. We explain the algorithms and report initial results of a comparative evaluation.
Download A Hybrid Approach to Musical Note Onset Detection
Common problems with current methods of musical note onset detection are detection of fast passages of musical audio, detection of all onsets within a passage with a strong dynamic range and detection of onsets of varying types, such as multi-instrumental music. We present a method that uses a subband decomposition approach to onset detection. An energy-based detector is used on the upper subbands to detect strong transient events. This yields precision in the time resolution of the onsets, but does not detect softer or weaker onsets. A frequency based distance measure is formulated for use with the lower subbands, improving detection accuracy of softer onsets. We also present a method for improving the detection function, by using a smoothed difference metric. Finally, we show that the detection threshold may be set automatically from analysis of the statistics of the detection function, with results comparable in most places to manual setting of thresholds.
Download Automatic Polyphonic Piano Note Extraction Using Fuzzy Logic in a Blackboard System
This paper presents a piano transcription system that transforms audio into MIDI format. Human knowledge and psychoacoustic models are implemented in a blackboard architecture, which allows the adding of knowledge with a top-down approach. The analysis is adapted to the information acquired. This technique is referred to as a prediction-driven approach, and it attempts to simulate the adaptation and prediction process taking place in human auditory perception. In this paper we describe the implementation of Polyphonic Note Recognition using a Fuzzy Inference System (FIS) as part of the Knowledge sources in a Blackboard system. The performance of the transcription system shows how polyphonic music transcription is still an unsolved problem, with a success of 45% according to the Dixon formula. However if we consider only the transcribed notes the success increases to 74%. Moreover, the results obtained in the paper presented in [1], show how the transcription can be used with success in a retrieval system, encouraging the authors to develop this technique for more accurate transcription results.
Download Polyphonic Transcription Using Piano Modeling for Spectral Pattern Recognition
Polyphonic transcription needs a correct identification of notes and chords. We have centered the efforts in piano chords identification. Pattern recognition using spectral patterns has been used as the identification method. The spectrum of the signal is compared with a set of spectra (patterns). The patterns are generated by a piano model that takes into account acoustic parameters and typical manufacturer criteria, that are adjusted by training the model with a few notes. The algorithm identifies notes and, iteratively, chords. Chords identification requires spectral substraction that is performed using masks. The analyzing algorithm used for training, avoids false partials detection due to nonlinear components and takes into account inharmonicity for spectrum segmentation. The method has been tested with live piano sounds recorded from two different grand pianos. Successful identification of up to four-notes chords has been carried out.
Download Survey on Extraction of Sinusoids in Stationary Sounds
This paper makes a survey of the numerous analysis methods proposed in order to extract the frequency, amplitude, and phase of sinusoidal components from stationary sounds, which is of great interest for spectral modeling, digital audio effects, or pitch tracking for instance. We consider different methods that improve the frequency resolution of a plain FFT. We compare the accuracies in frequency and amplitude of all these methods. As the results show, all considered methods have a great advantage over the plain FFT.
Download Sinusoidal Parameter Extraction and Component Selection in a non Stationary Model
In this paper, we introduce a new analysis technique particularly suitable for the sinusoidal modeling of non-stationary signals. This method, based on amplitude and frequency modulation estimation, aims at improving traditional Fourier parameters and enables us to introduce a new peak selection process, so that only peaks having coherent parameters are considered in subsequent stages (e.g. partial tracking, synthesis). This allows our spectral model to better handle natural sounds.
Download Sub-Band Independent Subspace Analysis for Drum Transcription
While Independent Subspace Analysis provides a means of separating sound sources from a single channel signal, making it an effective tool for drum transcription, it does have a number of problems. Not least of these is that the amount of information required to allow separation of sound sources varies from signal to signal. To overcome this indeterminacy and improve the robustness of transcription an extension of Independent Subspace Analysis to include sub-band processing is proposed. The use of this approach is demonstrated by its application in a simple drum transcription algorithm.
Download An Extension for Source Separation Techniques Avoiding Beats
The problem of separating individual sound sources from a mixture of these, known as Source Separation or Computational Auditory Scene Analysis (CASA), has become popular in the recent decades. A number of methods have emerged from the study of this problem, some of which perform very well for certain types of audio sources, e.g. speech. For separation of instruments in music, there are several shortcomings. In general when instruments play together they are not independent of each other. More specifically the time-frequency distributions of the different sources will overlap. Harmonic instruments in particular have high probability of overlapping partials. If these overlapping partials are not separated properly, the separated signals will have a different sensation of roughness, and the separation quality degrades. In this paper we present a method to separate overlapping partials in stereo signals. This method looks at the shapes of partial envelopes, and uses minimization of the difference between such shapes in order to demix overlapping partials. The method can be applied to enhance existing methods for source separation, e.g. blind source separation techniques, model based techniques, and spatial separation techniques. We also discuss other simpler methods that can work with mono signals.
Download Real Time Implementation of the HVXC MPEG-4 Speech Coder
In this paper we present the results of the code optimization for the HVXC MPEG-4 speech coder. Two kinds of bit-rate formats are considered: 2 and 4 kbit/s. After a short description of the HVXC main features, results of code optimization are reported: the real time implementationon, on a floating point DSP, of three parallel 2 kbit/s or two parallel 4 kbit/s HVXC coders, is shown to be possible.
Download Optimizing Digital Musical Effect Implementation for Multiple Processor DSP Systems
In the area of digital musical effect implementation, attention has lately been focused on computer workstations designed for digital processing of sound, which perform all operations with audio signals in real time. They are in fact a combination of powerful computer program and hardware cards with digital signal processors. Thanks to the power enhancement of personal computer core, performing these operations in the CPU is currently possible. However, in most cases, digital signal processors are still used for these purposes because digital musical effect modelling is more effective and more precise with the digital signal processor. In addition to this, processing in digital signal processor saves the CPU computing power for other functions.