Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility
Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This paper demonstrates how existing GUI elements can be translated into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system is designed to spatialize the auditory output from VoiceOver, the built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability. A between-groups experiment was conducted to compare standard VoiceOver with the proposed spatialized version. Non visually-impaired participants (n = 32), with no visual access to the test interface, completed a list-based exploration and then attempted to reconstruct the UI solely from auditory cues. Experimental results indicate that the head-tracked group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant differences in self-reported workload or usability. These findings suggest that potential benefits may come from the integration of head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users are needed. Although the experimental testbed uses a generic desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio approach could benefit visually impaired producers and musicians navigating plug-in controls.
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations
This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL, and HAAQI —in the context of digital music production. Unlike previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under fixed-level, randomly fluctuating degradations. In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic effectively tracked degradation changes, while PEAQ Advanced failed consistently and ViSQOL showed low sensitivity to hum and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021), PEAQ Basic (0.057), and PEAQ Advanced (0.065). However, ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity. These findings highlight the strengths and limitations of each metric for music production, specifically quality measurement with compressed audio. The source code and dataset will be made publicly available upon publication.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method
Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation
Reverberation is crucial in the acoustical design of physical spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of RESs strongly depends on the properties of the physical room and the architecture of the Digital Signal Processor (DSP). However, room-impulse-response (RIR) measurements and the DSP code from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs. The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like pipeline. The replication of previous studies by the authors shows that PyRES can become a useful tool in future research on RESs.
Download Auditory Discrimination of Early Reflections in Virtual Rooms
This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position reversal (Left–Right, Front–Back, and Floor–Ceiling) and three reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near chance. This result can be explained by the higher acuity and lower localisation blur found for the human auditory system. A two-way ANOVA confirmed a significant main effect of spatial position (p = 0.00685, η² = 0.1605), with no significant effect of reverberation or interaction. The analysis of the binaural room impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.
Download Biquad Coefficients Optimization via Kolmogorov-Arnold Networks
Conventional Deep Learning (DL) approaches to Infinite Impulse Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and limited accuracy. As an alternative, in this paper, we explore the use of Kolmogorov-Arnold Networks (KANs) to predict the IIR filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve smooth coefficients’ optimization. Furthermore, by constraining the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate results. This offers new possibilities for integrating digital infinite impulse response (IIR) filters into deep-learning frameworks.
Download Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values
Audio effects and sound synthesizers are widely used processors in popular music. Their parameters control the quality of the output sound. Multiple combinations of parameters can lead to the same sound. While recent approaches have been proposed to estimate these parameters given only the output sound, those are deterministic, i.e. they only estimate a single solution among the many possible parameter configurations. In this work, we propose to model the parameters as probability distributions instead of deterministic values. To learn the distributions, we optimize two objectives: (1) we minimize the reconstruction error between the ground truth output sound and the one generated using the estimated parameters, asisit usuallydone, but also(2)we maximize the parameter diversity, using entropy. We evaluate our approach through two numerical audio experiments to show its effectiveness. These results show how our approach effectively outputs multiple combinations of parameters to match one sound.
Download A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education
This article presents ZePolA, a digital audio equalizer designed as an educational resource for understanding digital filter design. Unlike conventional equalization plug-ins, which define the frequency response first and then derive the filter coefficients, this software adopts an inverse approach: users directly manipulate the placement of poles and zeros on the complex plane, with the corresponding frequency response visualized in real time. This methodology provides an intuitive link between theoretical filter concepts and their practical application. The plug-in features three main panels: a filter parameter panel, a frequency response panel, and a filter design panel. It allows users to configure a cascade of firstor second-order filter elements, each parameterized by the location of its poles or zeros. The GUI supports interaction through drag-and-drop gestures, enabling immediate visual and auditory feedback. This hands-on approach is intended to enhance learning by bridging the gap between theoretical knowledge and practical application. To assess the educational value and usability of the plug-in, a preliminary evaluation was conducted with focus groups of students and lecturers. Future developments will include support for additional filter types and increased architectural flexibility. Moreover, a systematic validation study involving students and educators is proposed to quantitatively evaluate the plug-in’s impact on learning outcomes. This work contributes to the field of digital signal processing education by offering an innovative tool that merges the hands-on approach of music production with a deeper theoretical understanding of digital filters, fostering an interactive and engaging educational experience.
Download Zero-Phase Sound via Giant FFT
Given the speedy computation of the FFT in current computer hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio signal can be obtained by zeroing the phase angle of its complex spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to enhance the resulting sound quality. As a result, a sound with the same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are retained, however. Zero-phase sounds are palindromic in the sense that they are symmetric in time. A comparison of the zero-phase conversion to the autocorrelation function helps to understand its properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real and imaginary parts of the spectrum to produce a stereo effect. A frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.
Download Partiels – Exploring, Analyzing and Understanding Sounds
This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.