Download Distortion and Pitch Processing Using a Modal Reverberator Architecture A reverberator based on a room response modal analysis is adapted to produce distortion, pitch and time manipulation effects, as well as gated and iterated reverberation. The so-called “modal reverberator” is a parallel collection of resonant filters, with resonance frequencies and dampings tuned to the modal frequencies and decay times of the space or object being simulated. Here, the resonant filters are implemented as cascades of heterodyning, smoothing, and modulation steps, forming a type of analysis/synthesis architecture. By applying memoryless nonlinearities to the modulating sinusoids, distortion effects are produced, including distortion without intermodulation products. By using different frequencies for the heterodyning and associated modulation operations, pitch manipulation effects are generated, including pitch shifting and spectral “inversion.” By resampling the smoothing filter output, the signal time axis is stretched without introducing pitch changes. As these effects are integrated into a reverberator architecture, reverberation controls such as decay time can be used produce novel effects having some of the sonic characteristics of reverberation.
Download Stereo signal separation and upmixing by mid-side decomposition in the frequency-domain An algorithm to estimate the perceived azimuth directions in a stereo signal is derived from a typical signal model. These estimated directions can then be used to separate direct and ambient signal components and to remix the original stereo track. The processing is based on the idea of a bandwise mid-side decomposition in the frequency-domain which allows an intuitive and easy to understand mathematical derivation. An implementation as a stereo to five channel upmix is able to deliver a high quality surround experience at low computational costs and demonstrates the practical applicability of the presented approach.
Download Automatic subgrouping of multitrack audio Subgrouping is a mixing technique where the outputs of a subset of audio tracks in a multitrack are summed to a single audio bus. This is done so that the mix engineer can apply signal processing to an entire subgroup, speed up the mix work flow and manipulate a number of audio tracks at once. In this work, we investigate which audio features from a set of 159 can be used to automatically subgroup multitrack audio. We determine a subset of audio features from the original 159 audio features to use for automatic subgrouping, by performing feature selection using a Random Forest classifier on a dataset of 54 individual multitracks. We show that by using agglomerative clustering on 5 test multitracks, the entire set of audio features incorrectly clusters 35.08% of the audio tracks, while the subset of audio features incorrectly clusters only 7.89% of the audio tracks. Furthermore, we also show that using the entire set of audio features, ten incorrect subgroups are created. However, when using the subset of audio features, only five incorrect subgroups are created. This indicates that our reduced set of audio features provides a significant increase in classification accuracy for the creation of subgroups automatically.
Download Separation of musical notes with highly overlapping partials using phase and temporal constrained complex matric factorization In note separation of polyphonic music, how to separate the overlapping partials is an important and difficult problem. Fifths and octaves, as the most challenging ones, are, however, usually seen in many cases. Non-negative matrix factorization (NMF) employs the constraints of energy and harmonic ratio to tackle this problem. Recently, complex matrix factorization (CMF) is proposed by combining the phase information in source separation problem. However, temporal magnitude modulation is still serious in the situation of fifths and octaves, when CMF is applied. In this work, we investigate the temporal smoothness model based on CMF approach. The temporal ac-tivation coefficient of a preceding note is constrained when the succeeding notes appear. Compare to the unconstraint CMF, the magnitude modulation are greatly reduced in our computer simulation. Performance indices including sourceto-interference ratio (SIR), source-to-artifacts ratio (SAR), sourceto-distortion ratio (SDR), as well as modulation error ratio (MER) are given.
Download Automatic calibration and equalization of a line array system This paper presents an automated Public Address processing unit, using delay and magnitude response adjustment. The aim is to achieve a flat frequency response and delay adjustment between different physically-placed speakers at the measuring point, which is nowadays usually made manually by the sound technician. The adjustment is obtained using three signal processing operations to the audio signal: time delay adjustment, crossover filtering, and graphic equalization. The automation is in the calculation of different parameter sets: estimation of the time delay, the selection of a suitable crossover frequency, and calculation of the gains for a third-octave graphic equalizer. These automatic methods reduce time and effort in the calibration of line-array PA systems, since only three sine sweeps must be played through the sound system. Measurements have been conducted in an anechoic chamber using a 1:10 scale model of a line array system to verify the functioning of the automatic calibration and equalization methods.
Download AM/FM DAFx In this work we explore audio effects based on the manipulation of estimated AM/FM decomposition of input signals, followed by resynthesis. The framework is based on an incoherent monocomponent based decomposition. Contrary to reports that discourage the usage of this simple scenario, our results have shown that the artefacts introduced in the audio produced are acceptable and not even noticeable in some cases. Useful and musically interesting effects were obtained in this study, illustrated with audio samples that accompany the text. We also make available Octave code for future experiments and new Csound opcodes for real-time implementations.
Download On comparison of phase alignments of harmonic components This paper provides a method for comparing phase angles of harmonic sound sources. In particular, we propose an algorithm for decomposing the difference between two sets of phases into a harmonic part, which represents the phase progress of harmonic components, and a residue part, which represents all causes of deviations from perfect harmonicity. This decomposition allows us to compare phase alignments regardless of an arbitrary time shift, and handle harmonic and noise/inharmonic parts of the phase angle separately to improve existing algorithms that handles harmonic sound sources using phase measurements. These benefits are demonstrated with a new phase-based pitch marking algorithm and an improved time-scale and pitch modification scheme using traditional harmonic sinusoidal modelling.
Download Towards Transient Restoration in Score-informed Audio Decomposition Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude of the Short-Time Fourier Transform (STFT) of the mixture signal. The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. In this paper, we propose a simple, yet effective extension of the iterative signal reconstruction procedure by Griffin and Lim to remedy this problem. In a first experiment, under laboratory conditions, we show that our method considerably attenuates pre-echos while still showing similar convergence properties as the original approach. A second, more realistic experiment involving score-informed audio decomposition shows that the proposed method still yields improvements, although to a lesser extent, under non-idealized conditions.
Download Towards an Invertible Rhythm Representation This paper investigates the development of a rhythm representation of music audio signals, that (i) is able to tackle rhythm related tasks and, (ii) is invertible, i.e. is suitable to reconstruct audio from it with the corresponding rhythm content being preserved. A conventional front-end processing schema is applied to the audio signal to extract time varying characteristics (accent features) of the signal. Next, a periodicity analysis method is proposed that is capable of reconstructing the accent features. Afterwards, a network consisting of Restricted Boltzmann Machines is applied to the periodicity function to learn a latent representation. This latent representation is finally used to tackle two distinct rhythm tasks, namely dance style classification and meter estimation. The results are promising for both input signal reconstruction and rhythm classification performance. Moreover, the proposed method is extended to generate random samples from the corresponding classes.
Download Low-delay vector-quantized subband ADPCM coding Several modern applications require audio encoders featuring low data rate and lowest delays. In terms of delay, Adaptive Differential Pulse Code Modulation (ADPCM) encoders are advantageous compared to block-based codecs due to their instantaneous output and therefore preferred in time-critical applications. If the the audio signal transport is done block-wise anyways, as in Audio over IP (AoIP) scenarios, additional advantages can be expected from block-wise coding. In this study, a generalized subband ADPCM concept using vector quantization with multiple realizations and configurations is shown. Additionally, a way of optimizing the codec parameters is derived. The results show that for the cost of small algorithmic delays the data rate of ADPCM can be significantly reduced while obtaining a similar or slightly increased perceptual quality. The largest algorithmic delay of about 1 ms at 44.1 kHz is still smaller than the ones of well-known low-delay codecs.