Download Singing Voice Resynthesis Using Vocal Sound Libraries
Although resynthesis may seem a simple analysis/synthesis process, it is a quite complex task, even more when it comes to recreating a singing voice. This paper presents a system whose goal is to start with an original audio stream of someone singing and recreate the same performance (melody, phonetics, dynamics) using an internal vocal sound library (choir or solo voice). By extracting dynamics and pitch information, and looking for phonetic similarities between the original audio frames and the frames of the sound library, a completely new audio stream is created. The obtained audio results, although not perfect (mainly due to the existence of audio artifacts), show that this technological approach may become an extremely powerful audio tool.
Download Simulating Idiomatic Playing Styles in a Classical Guitar Synthesizer: Rasgueado as a Case Study
This paper presents our research efforts to synthesize complex instrumental gestures using a score-based control scheme. Our specific goal is to simulate the rasgueado technique that is popular especially in flamenco music. This technique is also used in the classical guitar repertoire. Rasgueado is especially challenging as ordinary music notation is not adequate to represent the dense stream of notes required for a convincing simulation. We will take two approaches to realize our task. First, we use the practical knowledge of how the actual performance is accomplished by the human player. A second, complementary, approach is to analyze an excerpt from real guitar playing. Our main focus here is to extract the onset times and the amplitudes of the recoded gesture. Next we combine the results from the two analysis steps using a constraintbased approach to find possible pitch and fingering sequences. Finally we translate the findings to our macro-note scheme that allows us to fill algorithmically a musical score.
Download Fan Chirp Transformation for Music Representation
In this work the Fan Chirp Transform (FChT), which provides an acute representation of harmonically related linear chirp signals, is applied to the analysis of pitch content in polyphonic music. The implementation introduced was devised to be computationally manageable and enables the generalization of the FChT for the analysis of non-linear chirps. The combination with the Constant Q Transform is explored to build a multi-resolution FChT. An existing method to compute pitch salience from the FChT is improved and adapted to handle polyphonic music. In this way a useful melodic content visualization tool is obtained. The results of a frame based melody detection evaluation indicate that the introduced technique is very promising as a front-end for music analysis.
Download A Reduced Multiple Gabor Frame for Local Time Adaptation of the Spectrogram
In this paper we propose a method for automatic local time adaptation of the spectrogram of an audio signal, based on its decomposition within a Gabor multi-frame. The sparsity of the analyses within each individual frame is evaluated through the Rényi entropies measures. According to the sparsity of the decompositions, an optimal resolution and a reduced multi-frame are determined, defining an adapted spectrogram with variable resolution and hop size. The composition of such a reduced multi-frame allows an immediate definition of a dual frame: re-synthesis techniques for this adapted analysis are easily derived by the traditional phase vocoder scheme.
Download Adjusting the Spectral Envelope Evolution of Transposed Sounds with Gabor Mask Prototypes
Audio-samplers often require to modify the pitch of recorded sounds in order to generate scales or chords. This article tackles the use of Gabor masks and their capacity to improve the perceptual realism of transposed notes obtained through the classical phasevocoder algorithm. Gabor masks can be seen as operators that allows the modification of time-dependent spectral content of sounds by modifying their time-frequency representation. The goal here is to restore a distribution of energy that is more in line with the physics of the structure that generated the original sound. The Gabor mask is elaborated using an estimation of the spectral envelope evolution in the time-frequency plane, and then applied to the modified Gabor transform. This operation turns the modified Gabor transform into another one which respects the estimated spectral envelope evolution, and therefore leads to a note that is more perceptually convincing.
Download Frequency, Phase and Amplitude Estimation of Overlapping Partials in Monaural Musical Signals
A method is described that simultaneously estimates the frequency, phase and amplitude of two overlapping partials in a monaural musical signal from the amplitudes and phases in three frequency bins of the signal’s Odd Discrete Fourier Transform (ODFT). From the transform of the analysis window in its analytical form, and given the frequencies of the two partials, an analytical solution for the amplitude and phase of the two overlapping partials was obtained. Furthermore, the frequencies are estimated numerically solving a system of two equations and two unknowns, since no analytical solution could be found. Although the estimation is done independently frame by frame, particular situations (e.g. extremely close frequencies, same phase in the time window) lead to errors, which can be partly corrected with a moving average filter over several time frames. Results are presented for artificial sinusoids with time varying frequencies and amplitudes, and with different levels of noise added. The system still performs well with a Signalto-Noise ratio of down to 30 dB, with moderately modulated frequencies, and time varying amplitudes.
Download Breaking the Bounds: Introducing Informed Spectral Analysis
Sound applications based on sinusoidal modeling highly depend on the efficiency and the precision of the estimators of its analysis stage. In a previous work, theoretical bounds for the best achievable precision were shown and these bounds are reached by efficient estimators like the reassignment or the derivative methods. We show that it is possible to break these theoretical bounds with just a few additional bits of information of the original content, introducing the concept of “informed analysis”. This paper shows that existing estimators combined with some additional information can reach any expected level of precision, even in very low signal-to-noise ratio conditions, thus enabling high-quality sound effects, without the typical but unwanted musical noise.
Download Improving RTISI Phase Estimation with Energy Order and Phase Unwrapping
This paper presents two ways to improve the Real-Time Iterative Spectrogram Inversion (RTISI) algorithm. The standard RTISI phase estimator with look-ahead processes the buffered frames in reverse order. We show that better results are achieved by controlling this order according to frame energy. Another improvement is to initialize the last row of the phase estimator buffer by progressing the unwrapped phase difference of the previous frames. Furthermore, we extend these improvements to dual window length phase estimation and analyze the performance in SER with respect to different analysis window lengths.
Download On the Use of Sums of Sines in the Design of Signal Windows
Windowing of discrete signals by temporal weighting is an essential tool for spectral analysis and processing to reduce bias effects. Many popular weighting functions (e. g. Hann, Hamming, Blackman) are based on a sum of scaled cosines. This paper presents an alternative class of windows, constructed using sums of sines and exhibiting unique spectral behavior with regard to zero location and a side lobe decay of at least –12 dB/octave due to guaranteed continuity of the weighting. The parameters for the 2- and 3-term realizations with minimum peak side lobe level are provided. Usage of the sum-of-sines windows with the DFT and their adoption to lapped transforms such as the MDCT are also examined.
Download Harmonize-Decompose Audio Signals with Global Amplitude and Frequency Modulations
A key building block in music transcription and indexing operations is the decomposition of music signals into notes. We model a note signal as a periodic signal with slow (frequency-selective) amplitude modulation and global frequency-warping. Global frequency-warping allows for an inharmonic frequency modulation, while the global amplitude modulation allows the various harmonics of the periodic signal to decay at different speeds. The global frequency-warping is achieved by a Laguerre transform (that has shown to fit stiffed strings inharmonic behavior). Assuming additive noise, the estimation of the model parameters and the optimization is performed in a Harmonize-Extract fashion. Simulations illustrate that the extraction technique oversteps the limitation of the global AM-FM representation and analysis techniques and allows the processing of inharmonic string instruments (e.g. piano).