Download Automatic Partial Extraction From the Modal Distribution The Modal Distribution (MD) is a time-frequency distribution specifically designed to model the quasi-harmonic, multisinusoidal, nature of music signals and belongs to the Cohen general class of time-frequency distributions. The problem of signal synthesis from bilinear time-frequency representations such as the Wigner distribution has been investigated [1,14] using methods which exploit an outer-product interpretation of these distributions. Methods of synthesis from the MD based on a sinusoidal-analysis-synthesis procedure using estimates of instantaneous frequency and amplitude values have relied on a heuristic search ‘by eye’ for peaks in the time-frequency domain [2,7,8]. An approach to detection of sinusoidal components with the Wigner Distribution has been investigated in [15] based on a comparison of peak magnitudes with the DFT and STFT. In this paper we propose an improved frequency smoothing kernel for use in MD partial tracking and adapt the McCauley-Quatieri sinusoidal analysis procedure to enable a sum of sinusoids synthesis. We demonstrate that the improved kernel enhances automatic partial extraction and that the MD estimates of instantaneous amplitude and frequency are preserved. Suggestions for future extensions to the synthesis procedure are given.
Download VST Plug-in Module Performing Wavelet Transform in Real-time The paper presents a variant of the segmentwise wavelet transform (blockwise DWT, online DWT or SegDWT) algorithm adapted to real-time audio processing. The implementation of the algorithm as a VST plugin is presented as well. The main problem of segmentwise wavelet coefficient processing is the handling of the segment borders. The common border extension methods result in “false” coefficients, which in turn result in border distortion (block-end effects) after particular types of coefficient processing. In contrast, the SegDWT algorithm employs a segment extension technique to prevent this inconvenience and produce exactly the same coefficients as the wavelet transform of the whole signal would do. In this paper we remove some of the shortcomings of the original SegDWT algorithm; for example the need for the “right” segment extension is canceled. The VST plugin module created is described from the viewpoints of both the user and the programmer; the latter can easily add their own method for processing the coefficients.
Download Practical Empirical Mode Decomposition For Audio Synthesis A new method of Synthesis by Analysis for multi-component signals of fast changing instantaneous attributes is introduced. It makes use of two recent developments for signal decomposition to obtain near mono-component signals whose instantaneous attributes can be used for synthesis. Furthermore, by extension and combination of both decomposition methods, the overall quality of the decomposition is shown to improve considerably.
Download The Simplest Analysis Method for Non-Stationary Sinusoidal Modeling This paper introduces an analysis method based on the generalization of the phase vocoder approach to non-stationary sinusoidal modeling. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and frequency modulation), and to the Cramér-Rao bounds. It turns out that this method compares to the state of the art in terms of performances, with the great advantage of being much simpler.
Download Metamorph: Real-Time High-Level Sound Transformations Based on a Sinusoids Plus Noise Plus Transients Model Spectral models provide ways to manipulate musical audio signals that can be both powerful and intuitive, but high-level control is often required in order to provide flexible real-time control over the potentially large parameter set. This paper introduces Metamorph, a new open source library for high-level sound transformation. We describe the real-time sinusoids plus noise plus transients model that is used by Metamorph and explain the opportunities that it provides for sound manipulation.
Download Range-constrained Phase Reconstruction for Recovering Time-domain Signal from Quantized Amplitude & Phase Spectrogram This paper describes a novel algorithm for recovering time-domain signal from quantized amplitude and phase spectrogram, which is applicable for spectrogram-based audio coding. In order to obtain a better quality sound, a phase reconstruction technique is first applied with constraint for keeping phase in each time-frequency bin within each quantization range, and then, time-domain signal is recovered by the standard inverse short-time Fourier transform. Experimental evaluation based on the objective PEAQ measure shows that the proposed range-constrained phase reconstruction is effective for improving the sound quality.
Download Effective Separation of Low-Pitch Notes Using NMF Using Non-Power-of-2 Discrete Fourier Transforms Recently, non-negative matrix factorization (NMF), which is applied to decompose signals in frequency domain by means of short-time Fourier transform (STFT), is widely used in audio source separation. Separation of low-pitch notes in recordings is of significant interest. According to time-frequency uncertainty principle, it may suffer from the tradeoff between time and frequency localizations for low-pitch sounds. Furthermore, because the window function applied to the signal causes frequency spreading, separation of low-pitch notes becomes more difficult. Instead of using power-of-2 FFT, we experiment on STFT sizes corresponding to the pitches of the notes in the signals. Computer simulations using synthetic signals show that the Source to Interferences Ratio (SIR) is significantly improved without sacrificing Sources to Artifacts Ratio (SAR) and Source to Distortion Ratio (SDR). In average, at least 2 to 6 dB improvement in SIR is achieved when compared to power-of-2 FFT of similar sizes.
Download Shifted NMF with Group Sparsity for Clustering NMF Basis Functions Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to improve the clustering of the basis functions to sources, much research is still required in this area. Recently, Shifted Non-negative Matrix Factorisation (SNMF) was used to cluster these basis functions. To this end, we propose that the incorporation of group sparsity to the Shifted NMF based methods may benefit the clustering algorithms. We have tested this on SNMF algorithms with improved separation quality. Results show that this gives improved clustering of pitched basis functions over previous methods.
Download Sparse Decomposition, Clustering and Noise for Fire Texture Sound Re-Synthesis In this paper we introduce a framework that represents environmental texture sounds as a linear superposition of independent foreground and background layers that roughly correspond to entities in the physical production of the sound. Sound samples are decomposed into a sparse representation with the matching pursuit algorithm and a dictionary of Daubechies wavelet atoms. An agglomerative clustering procedure groups atoms into short transient molecules. A foreground layer is generated by sampling these sound molecules from a distribution, whose parameters are estimated from the input sample. The residual signal is modelled by an LPC-based source-filter model, synthesizing the background sound layer. The capability of the system is demonstrated with a set of fire sounds.
Download A jump start for NMF with N-FINDR and NNLS Nonnegative Matrix Factorization is a popular tool for the analysis of audio spectrograms. It is usually initialized with random data, after which it iteratively converges to a local optimum. In this paper we show that N-FINDR and NNLS, popular techniques for dictionary and activation matrix learning in remote sensing, prove useful to create a better starting point for NMF. This reduces the number of iterations necessary to come to a decomposition of similar quality. Adapting algorithms from the hyperspectral image unmixing and remote sensing communities, provides an interesting direction for future research in audio spectrogram factorization.