Download Expressive Oriented Time-Scale Adjustment for Mis-Played Musical Signals Based on Tempo Curve Estimations Musical recordings, when performed by non-proficient (amateur) performers, include two types of tempo fluctuations–intended “tempo curves” and non-intended “mis-played components”–due to poor control of instruments. In this study, we propose a method for estimating intended tempo fluctuations, called “true tempo curves,” from mis-played recordings. We also propose an automatic audio signal modification that can adjust the signal by time-scale modification with an estimated true tempo curve to remove the mis-played component. Onset timings are detected by an onset detection method based on the human auditory system. The true tempo curve is estimated by polynomial regression analysis using detected onset timings and score information. The power spectrograms of the observed musical signals are adjusted using the true tempo curve. A subjective evaluation was performed to test the closeness of the rhythm, and it was observed that the mean opinion score values of the adjusted sounds were higher than those of the original recorded sound, and significant differences were observed for all tested instruments.
Download Information Retrieval of Marovany Zither Music Based on an Original Optical-Based System In this work, we introduced an original optical-based retrieval system dedicated to the music analysis of the marovany zither, a traditional instrument of Madagascar. From a humanistic perspective, our motivation for studying this particular instrument is its cultural importance due to its association with a possession ritual called tromba. The long-term goal of this work is to achieve a systematic classification of the marovany musical repertoire in this context of trance, and to classify the different recurrent musical patterns according to identifiable information. From an engineering perspective, we worked on the problem of competing signals in audio field recordings, e.g., from audience participation or percussion instruments. To overcome this problem, we recommended the use of a multichannel optical recording, putting forward technological qualities such as acquisition of independent signals corresponding to each string, high signal to noise ratio (high sensitivity to string displacement / low sensitivity to external sources), systematic inter-notes demarcation resulting from the finger-string contact. Optical signal characteristics greatly simplify the delicate task of automatic music transcription, especially when facing polyphonic music in noisy environment.
Download The Tonalness Spectrum: Feature-Based Estimation of Tonal Components The tonalness spectrum shows the likelihood of a spectral bin being part of a tonal or non-tonal component. It is a non-binary measure based on a set of established spectral features. An easily extensible framework for the computation, selection, and combination of features is introduced. The results are evaluated and compared in two ways. First with a data set of synthetically generated signals but also with real music signals in the context of a typical MIR application.
Download Perception & Evaluation of Audio Quality in Music Production A dataset of audio clips was prepared and audio quality assessed by subjective testing. Encoded as digital signals, a large amount of feature-extraction was possible. A new objective metric is proposed, describing the Gaussian nature of a signal’s amplitude distribution. Correlations between objective measurements of the music signals and the subjective perception of their quality were found. Existing metrics were adjusted to match quality perception. A number of timbral, spatial, rhythmic and amplitude measures, in addition to predictions of emotional response, were found to be related to the perception of quality. The emotional features were found to have most importance, indicating a connection between quality and a unified set of subjective and objective parameters.
Download Unsupervised Audio Key and Chord Recognition This paper presents a new methodology for determining chords of a music piece without using training data. Specifically, we introduce: 1) a wavelet-based audio denoising component to enhance a chroma-based feature extraction framework, 2) an unsupervised key recognition component to extract a bag of local keys, 3) a chord recognizer using estimated local keys to adjust the chromagram based on a set of well-known tonal profiles to recognize chords on a frame-by-frame basis. We aim to recognize 5 classes of chords (major, minor, diminished, augmented, suspended) and 1 N (no chord or silence). We demonstrate the performance of the proposed approach using 175 Beatles’ songs which we achieved 75% in F-measure for estimating a bag of local keys and at least 68.2% accuracy on chords without discarding any audio segments or the use of other musical elements. The experimental results also show that the wavelet-based denoiser improves the chord recognition rate by approximately 4% over that of other chroma features.
Download Efficient DSP Implementation of Median Filtering for Real-Time Audio Noise Reduction In this paper an efficient real-time implementation of a median filter on a DSP platform is described. The implementation is based on the usage of a doubly linked list, which allows effective handling of the operations needed for the running computation of a median value. The structure of a doubly linked list is mapped onto the DSP architecture exploiting its special features for an efficient implementation. As an application example, a real-time denoiser for vinyl record playback is presented. The application program consists of two main parts, namely a subsystem for click detection and a subsystem for click removal. Both parts can be implemented using median filters.
Download Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF Separating the singing voice from a musical mixture is a problem widely addressed due to its various applications. However, most approaches do not tackle the separation of unvoiced consonant sounds, which causes a loss of quality in any vocal source separation algorithm, and is especially noticeable for unvoiced fricatives (e.g. /T/ in thing) due to their energy level and duration. Fricatives are consonants produced by forcing air through a narrow channel made by placing two articulators close together. We propose a method to model and separate unvoiced fricative consonants based on a semisupervised Non-negative Matrix Factorization, in which a set of spectral basis components are learnt from a training excerpt. We implemented this method as an extension of an existing well-known factorization approach for singing voice (SIMM). An objective evaluation shows a small improvement in the separation results. Informal listening tests show a significant increase of intelligibility in the isolated vocals.
Download A Complex Wavelet Based Fundamental Frequency Estimator in Single-Channel Polyphonic Signals In this work, a new estimator of the fundamental frequencies (F0 ) present in a polyphonic single-channel signal is developed. The signal is modeled in terms of a set of discrete partials obtained by the Complex Continuous Wavelet Transform (CCWT). The fundamental frequency estimation is based on the energy distribution of the detected partials of the input signal followed by an spectral smoothness technique. The proposed algorithm is designed to work with suppressed fundamentals, inharmonic partials and harmonic related sounds. The detailed technique has been tested over a set of input signals including polyphony 2 to 6, with high precision results that show the strength of the algorithm. The obtained results are very promising in order to include the developed algorithm as the basis of Blind Sound Source Separation or automatic score transcription techniques.
Download Maximum Filter Vibrato Suppression for Onset Detection We present SuperFlux - a new onset detection algorithm with vibrato suppression. It is an enhanced version of the universal spectral flux onset detection algorithm, and reduces the number of false positive detections considerably by tracking spectral trajectories with a maximum filter. Especially for music with heavy use of vibrato (e.g., sung operas or string performances), the number of false positive detections can be reduced by up to 60% without missing any additional events. Algorithm performance was evaluated and compared to state-of-the-art methods on the basis of three different datasets comprising mixed audio material (25,927 onsets), violin recordings (7,677 onsets) and operatic solo voice recordings (1,448 onsets). Due to its causal nature, the algorithm is applicable in both offline and online real-time scenarios.
Download Generating Musical Accompaniment Using Finite State Transducers The finite state transducer (FST), a type of finite state machine that maps an input string to an output string, is a common tool in the fields of natural language processing and speech recognition. FSTs have also been applied to music-related tasks such as audio fingerprinting and the generation of musical accompaniment. In this paper, we describe a system that uses an FST to generate harmonic accompaniment to a melody. We provide details of the methods employed to quantize a music signal, the topology of the transducer, and discuss our approach to evaluating the system. We argue for an evaluation metric that takes into account the quality of the generated accompaniment, rather than one that returns a binary value indicating the correctness or incorrectness of the accompaniment.