Download On Vibrato and Frequency (De)Modulation in Musical Sounds Vibrato is an important characteristic in human musical performance and is often uniquely characteristic to a player and/or a particular instrument. This work is motivated by the assumption (often made in the source separation literature) that vibrato aids in the identification of multiple sound sources playing in unison. It follows that its removal, the focus herein, may contribute to a more blended combination. In signals, vibrato is often modeled as an oscillatory deviation from a center pitch/frequency that presents in the sound as phase/frequency modulation. While vibrato implementation using a time-varying delay line is well known, using a delay line for its removal is less so. In this work we focus on (de)modulation of vibrato in a signal by first showing the relationship between modulation and corresponding demodulation delay functions and then suggest a solution for increased vibrato removal in the latter by ensuring sideband attenuation below the threshold of audibility. Two known methods for estimating the instantaneous frequency/phase are used to construct delay functions from both contrived and musical examples so that vibrato removal may be evaluated.
Download RIR2FDN: An Improved Room Impulse Response Analysis and Synthesis This paper seeks to improve the state-of-the-art in delay-networkbased analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
Download Modeling the Frequency-Dependent Sound Energy Decay of Acoustic Environments with Differentiable Feedback Delay Networks Differentiable machine learning techniques have recently proved effective for finding the parameters of Feedback Delay Networks (FDNs) so that their output matches desired perceptual qualities of target room impulse responses. However, we show that existing methods tend to fail at modeling the frequency-dependent behavior of sound energy decay that characterizes real-world environments unless properly trained. In this paper, we introduce a novel perceptual loss function based on the mel-scale energy decay relief, which generalizes the well-known time-domain energy decay curve to multiple frequency bands. We also augment the prototype FDN by incorporating differentiable wideband attenuation and output filters, and train them via backpropagation along with the other model parameters. The proposed approach improves upon existing strategies for designing and training differentiable FDNs, making it more suitable for audio processing applications where realistic and controllable artificial reverberation is desirable, such as gaming, music production, and virtual reality.
Download Binaural Dark-Velvet-Noise Reverberator Binaural late-reverberation modeling necessitates the synthesis of frequency-dependent inter-aural coherence, a crucial aspect of spatial auditory perception. Prior studies have explored methodologies such as filtering and cross-mixing two incoherent late reverberation impulse responses to emulate the coherence observed in measured binaural late reverberation. In this study, we introduce two variants of the binaural dark-velvet-noise reverberator. The first one uses cross-mixing of two incoherent dark-velvet-noise sequences that can be generated efficiently. The second variant is a novel time-domain jitter-based approach. The methods’ accuracies are assessed through objective and subjective evaluations, revealing that both methods yield comparable performance and clear improvements over using incoherent sequences. Moreover, the advantages of the jitter-based approach over cross-mixing are highlighted by introducing a parametric width control, based on the jitter-distribution width, into the binaural dark velvet noise reverberator. The jitter-based approach can also introduce timedependent coherence modifications without additional computational cost.
Download Differentiable Active Acoustics - Optimizing Stability via Gradient Descent Active acoustics (AA) refers to an electroacoustic system that actively modifies the acoustics of a room. For common use cases, the number of transducers—loudspeakers and microphones—involved in the system is large, resulting in a large number of system parameters. To optimally blend the response of the system into the natural acoustics of the room, the parameters require careful tuning, which is a time-consuming process performed by an expert. In this paper, we present a differentiable AA framework, which allows multi-objective optimization without impairing architecture flexibility. The system is implemented in PyTorch to be easily translated into a machine-learning pipeline, thus automating the tuning process. The objective of the pipeline is to optimize the digital signal processor (DSP) component to evenly distribute the energy in the feedback loop across frequencies. We investigate the effectiveness of DSPs composed of finite impulse response filters, which are unconstrained during the optimization. We study the effect of multiple filter orders, number of transducers, and loss functions on the performance. Different loss functions behave similarly for systems with few transducers and low-order filters. Increasing the number of transducers and the order of the filters improves results and accentuates the difference in the performance of the loss functions.
Download Naturalness of Double-Slope Decay in Generalised Active Acoustic Enhancement Systems Active acoustic enhancement systems (AAESs) alter the perceived acoustics of a space by using microphones and loudspeakers to introduce sound energy into the room. Double-sloped energy decay may be observed in these systems. However, it is unclear as to which conditions lead to this effect, and to what extent double sloping reduces the perceived naturalness of the reverberation compared to Sabine decay. This paper uses simulated combinations of AAES parameters to identify which cases affect the objective curvature of the energy decay. A subjective test with trained listeners assessed the naturalness of these conditions. Using an AAES model, room impulse responses were generated for varying room dimensions, absorption coefficients, channel counts, system loop gains and reverberation times (RTs) of the artificial reverberator. The objective double sloping was strongly correlated to the ratio between the reverberator and passive room RTs, but parameters such as absorption and room size did not have a profound effect on curvature. It was found that double sloping significantly reduced the perceived naturalness of the reverberation, especially when the reverberator RT was greater than two times that of the passive room. Double sloping had more effect on the naturalness ratings when subjects listened to a more absorptive passive room, and also when using speech rather than transient stimuli. Lowering the loop gain by 9 dB increased the naturalness of the doublesloped stimuli, where some were rated as significantly more natural than the Sabine decay stimuli from the passive room.
Download A Common-Slopes Late Reverberation Model Based on Acoustic Radiance Transfer In rooms with complex geometry and uneven distribution of energy losses, late reverberation depends on the positions of sound sources and listeners. More precisely, the decay of energy is characterised by a sum of exponential curves with position-dependent amplitudes and position-independent decay rates (hence the name common slopes). The amplitude of different energy decay components is a particularly important perceptual aspect that requires efficient modeling in applications such as virtual reality and video games. Acoustic Radiance Transfer (ART) is a room acoustics model focused on late reverberation, which uses a pre-computed acoustic transfer matrix based on the room geometry and materials, and allows interactive changes to source and listener positions. In this work, we present an efficient common-slopes approximation of the ART model. Our technique extracts common slopes from ART using modal decomposition, retaining only the non-oscillating energy modes. Leveraging the structure of ART, changes to the positions of sound sources and listeners only require minimal processing. Experimental results show that even very few slopes are sufficient to capture the positional dependency of late reverberation, reducing model complexity substantially.
Download Differentiable MIMO Feedback Delay Networks for Multichannel Room Impulse Response Modeling Recently, with the advent of new performing headsets and goggles, the demand for Virtual and Augmented Reality applications has experienced a steep increase. In order to coherently navigate the virtual rooms, the acoustics of the scene must be emulated in the most accurate and efficient way possible. Amongst others, Feedback Delay Networks (FDNs) have proved to be valuable tools for tackling such a task. In this article, we expand and adapt a method recently proposed for the data-driven optimization of single-inputsingle-output FDNs to the multiple-input-multiple-output (MIMO) case for addressing spatial/space-time processing applications. By testing our methodology on items taken from two different datasets, we show that the parameters of MIMO FDNs can be jointly optimized to match some perceptual characteristics of given multichannel room impulse responses, overcoming approaches available in the literature, and paving the way toward increasingly efficient and accurate real-time virtual room acoustics rendering.
Download A Highly Parametrized Scattering Delay Network Implementation for Interactive Room Auralization Scattering Delay Networks (SDNs) are an interesting approach to artificial reverberation, with parameters tied to the room’s physical properties and the computational efficiency of delay networks. This paper presents a highly-parametrized and real-time plugin of an SDN. The SDN plugin allows for interactive room auralization, enabling users to modify the parameters affecting the reverberation in real-time. These parameters include source and receiver positions, room shape and size, and wall absorption properties. This makes our plugin suitable for applications that require realtime and interactive spatial audio rendering, such as virtual or augmented reality frameworks and video games. Additionally, the main contributions of this work include a filter design method for wall sound absorption, as well as plugin features such as air absorption modeling, various output formats (mono, stereo, binaural, and first to fifth order Ambisonics), open sound control (OSC) for controlling source and receiver parameters, and a graphical user interface (GUI). Evaluation tests showed that the reverberation time and the filter design approach are consistent with both theoretical references and real-world measurements. Finally, performance analysis indicated that the SDN plugin requires minimal computational resources.
Download Equalizing Loudspeakers in Reverberant Environments Using Deep Convolutive Dereverberation Loudspeaker equalization is an established topic in the literature, and currently many techniques are available to address most practical use cases. However, most of these rely on accurate measurements of the loudspeaker in an anechoic environment, which in some occurrences is not feasible. This is the case, e.g. of custom digital organs, which have a set of loudspeakers that are built into a large and geometrically-complex piece of furniture, which may be too heavy and large to be transported to a measurement room, or may require a big one, making traditional impulse response measurements impractical for most users. In this work we propose a method to find the inverse of the sound emission system in a reverberant environment, based on a Deep Learning dereverberation algorithm. The method is agnostic of the room characteristics and can be, thus, conducted in an automated fashion in any environment. A real use case is discussed and results are provided, showing the effectiveness of the approach in designing filters that match closely the magnitude response of the ideal inverting filters.