Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download On the Use of Perceptual Properties for Melody Estimation This paper is about the use of perceptual principles for melody estimation. The melody stream is understood as generated by the most dominant source. Since the source with the strongest energy may not be perceptually the most dominant one, it is proposed to study the perceptual properties for melody estimation: loudness, masking effect and timbre similarity. The related criteria are integrated into a melody estimation system and their respective contributions are evaluated. The effectiveness of these perceptual criteria is confirmed by the evaluation results using more than one hundred excerpts of music recordings.
Download On Stretching Gaussian Noises with the Phase Vocoder Recently, the processing of non-sinusoidal signals, or sound textures, has become an important topic in various areas. In general, the transformation is done by the phase vocoder techniques. Since the phase vocoder technique is based on a sinusoidal model, it’s performance is not satisfying when applied to transform sound textures. The following article investigates into the problem using as example the most basic non-sinusoidal sounds, that are noise signals. We demonstrate the problems that arise when time stretching noise with the phase vocoder, provide a description of some relevant statistical properties of the time frequency representation of noise and introduce an algorithm that allows to preserve these statistical properties when time stretching noise with the phase vocoder. The resulting algorithm significantly improves the perceptual quality of the time stretched noise signals and therefore it is seen as a promising first step towards an algorithm for transformation of sound textures.
Download On the Modeling of Sound Textures Based on the STFT Representation Sound textures are often noisy and chaotic. The processing of these sounds must be based on the statistics of its corresponding time-frequency representation. In order to transform sound textures with existing mechanisms, a statistical model based on the STFT representation is favored. In this article, the relation between statistics of a sound texture and its time-frequency representation is explored. We proposed an algorithm to extract and modify the statistical properties of a sound texture based on its STFT representation. It allows us to extract the statistical model of a sound texture and resynthesise the sound texture after modifications have been made. It could also be used to generate new samples of the sound texture from a given sample. The results of the experiment show that the algorithm is capable of generating high quality sounds from an extracted model. This result could serve as a basis for transformations like morphing or high-level control of sound textures.
Download Timbre-Constrained Recursive Time-Varying Analysis for Musical Note Separation Note separation in music signal processing becomes difficult when there are overlapping partials from co-existing notes produced by either the same or different musical instruments. In order to deal with this problem, it is necessary to involve certain invariant features of musical instrument sounds into the separation processing. For example, the timbre of a note of a musical instrument may be used as one possible invariant feature. In this paper, a timbre estimate is used to represent this feature such that it becomes a constraint when note separation is performed on a mixture signal. To demonstrate the proposed method, a timedependent recursive regularization analysis is employed. Spectral envelopes of different notes are estimated and a modified parameter update strategy is applied to the recursive regularization process. The experiment results show that the flaws due to the overlapping partial problem can be effectively reduced through the proposed approach.
Download Searching for Music Mixing Graphs: A Pruning Approach Music mixing is compositional — experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant processors and fine-tuning. We achieve this through differentiable implementation of both processors and pruning. Consequently, we find a sparse mixing graph that achieves nearly identical matching quality of the full mixing console. We apply this procedure to drymix pairs from various datasets and collect graphs that also can be used to train neural networks for music mixing applications.
Download Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GANbased model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.
Download GRAFX: An Open-Source Library for Audio Processing Graphs in Pytorch We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph are optimized via gradient descent. The code is available at https://github.com/sh-lee97/grafx.