Download End-to-end equalization with convolutional neural networks This work aims to implement a novel deep learning architecture to perform audio processing in the context of matched equalization. Most existing methods for automatic and matched equalization show effective performance and their goal is to find a respective transfer function given a frequency response. Nevertheless, these procedures require a prior knowledge of the type of filters to be modeled. In addition, fixed filter bank architectures are required in automatic mixing contexts. Based on end-to-end convolutional neural networks, we introduce a general purpose architecture for equalization matching. Thus, by using an end-toend learning approach, the model approximates the equalization target as a content-based transformation without directly finding the transfer function. The network learns how to process the audio directly in order to match the equalized target audio. We train the network through unsupervised and supervised learning procedures. We analyze what the model is actually learning and how the given task is accomplished. We show the model performing matched equalization for shelving, peaking, lowpass and highpass IIR and FIR equalizers.
Download Contact Sensor Processing for Acoustic Instrument Sensor Matching Using a Modal Architecture This paper proposes a method to filter the output of instrument contact sensors to approximate the response of a well placed microphone. A modal approach is proposed in which mode frequencies and damping ratios are fit to the frequency response of the contact sensor, and the mode gains are then determined for both the contact sensor and the microphone. The mode frequencies and damping ratios are presumed to be associated with the resonances of the instrument. Accordingly, the corresponding contact sensor and microphone mode gains will account for the instrument radiation. The ratios between the contact sensor and microphone gains are then used to create a parallel bank of second-order biquad filters to filter the contact sensor signal to estimate the microphone signal.
Download TU-Note Violin Sample Library – A Database of Violin Sounds with Segmentation Ground Truth The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analysis/synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vibrato, as well as solo pieces for performance analysis. All parts come with a hand-labeled segmentation ground truth which mark attack, release and transition/transient segments. Additional relevant information on the samples’ properties is provided for single sounds and two-note sequences. Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, using two microphone positions. This document describes the content and the recording setup in detail, alongside basic statistical properties of the data.
Download Parametric Multi-Channel Separation and Re-Panning of Harmonic Sources In this paper, a method for separating stereophonic mixtures into their harmonic constituents is proposed. The method is based on a harmonic signal model. An observed mixture is decomposed by first estimating the panning parameters of the sources, and then estimating the fundamental frequencies and the amplitudes of the harmonic components. The number of sources and their panning parameters are estimated using an approach based on clustering of narrowband interaural level and time differences. The panning parameter distribution is modelled as a Gaussian mixture and the generalized variance is used for selecting the number of sources. The fundamental frequencies of the sources are estimated using an iterative approach. To enforce spectral smoothness when estimating the fundamental frequencies, a codebook of magnitude amplitudes is used to limit the amount of energy assigned to each harmonic. The source models are used to form Wiener filters which are used to reconstruct the sources. The proposed method can be used for source re-panning (demonstration given), remixing, and multi-channel upmixing, e.g. for hi-fi systems with multiple loudspeakers.
Download Fast Partial Tracking of Audio with Real-Time Capability through Linear Programming This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to implement, and performs well in difficult tracking situations by considering spurious peaks, crossing partials, and a non-stationary shortterm sinusoidal model. Complex constant parameters of a generalized short-term signal model are explicitly estimated to inform peak matching decisions. Peak matching is formulated as a variation of the linear assignment problem. Combinatorially optimal peak-to-peak assignments are found in polynomial time using the Hungarian algorithm. Results show that the proposed method creates high-quality representations of monophonic and polyphonic sounds.
Download Modal Analysis Of Room Impulse Responses Using Subband Esprit This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying perceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient modeling of room impulse responses that contain sparse early reflections and dense late reverb. MUSHRA tests comparing the analysis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.
Download FAST MUSIC – An Efficient Implementation Of The Music Algorithm For Frequency Estimation Of Approximately Periodic Signals Noise subspace methods are popular for estimating the parameters of complex sinusoids in the presence of uncorrelated noise and have applications in musical instrument modeling and microphone array processing. One such algorithm, MUSIC (Multiple Signal Classification) has been popular for its ability to resolve closely spaced sinusoids. However, the computational efficiency of MUSIC is relatively low, since it requires an explicit eigenvalue decomposition of an autocorrelation matrix, followed by a linear search over a large space. In this paper, we discuss methods for and the benefits of converting the Toeplitz structure of the autocorrelation matrix to circulant form, so that eigenvalue decomposition can be replaced by a Fast Fourier Transform (FFT) of one row of the matrix. This transformation requires modeling the signal as at least approximately periodic over some duration. For these periodic signals, the pseudospectrum calculation becomes trivial and the accuracy of the frequency estimates only depends on how well periodicity detection works. We derive a closed-form expression for the pseudospectrum, yielding large savings in computation time. We test our algorithm to resolve closely spaced piano partials.
Download Hard real-time onset detection of percussive instruments To date, the most successful onset detectors are those based on frequency representation of the signal. However, for such methods the time between the physical onset and the reported one is unpredictable and may largely vary according to the type of sound being analyzed. Such variability and unpredictability of spectrum-based onset detectors may not be convenient in some real-time applications. This paper proposes a real-time method to improve the temporal accuracy of state-of-the-art onset detectors. The method is grounded on the theory of hard real-time operating systems where the result of a task must be reported at a certain deadline. It consists of the combination of a time-base technique (which has a high degree of accuracy in detecting the physical onset time but is more prone to false positives and false negatives) with a spectrum-based technique (which has a high detection accuracy but a low temporal accuracy). The developed hard real-time onset detector was tested on a dataset of single non-pitched percussive sounds using the high frequency content detector as spectral technique. Experimental validation showed that the proposed approach was effective in better retrieving the physical onset time of about 50% of the hits detected by the spectral technique, with an average improvement of about 3 ms and maximum one of about 12 ms. The results also revealed that the use of a longer deadline may capture better the variability of the spectral technique, but at the cost of a bigger latency.
Download Musikverb: A Harmonically Adaptive Audio Reverberation We present MusikVerb, a novel digital reverberation capable of adapting its output to the harmonic context of a live music performance. The proposed reverberation is aware of the harmonic content of an audio input signal and ‘tunes’ the reverberation output to its harmonic content using a spectral filtering technique. The dynamic behavior of MusikVerb avoids the sonic clutter of traditional reverberation, and most importantly, fosters creative endeavor by providing new expressive and musically-aware uses of reverberation. Despite its applicability to any input audio signal, the proposed effect has been designed primarily as a guitar pedal effect and a standalone software application.
Download A Virtual Tube Delay Effect A virtual tube delay effect based on the real-time simulation of acoustic wave propagation in a garden hose is presented. The paper describes the acoustic measurements conducted and the analysis of the sound propagation in long narrow tubes. The obtained impulse responses are used to design delay lines and digital filters, which simulate the propagation delay, losses, and reflections from the end of the tube which may be open, closed, or acoustically attenuated. A study on the reflection caused by a finite-length tube is described. The resulting system consists of a digital waveguide model and produces delay effects having a realistic low-pass filtering. A stereo delay effect plugin in P URE DATA1 has been implemented and it is described here.