Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This
paper demonstrates how existing GUI elements can be translated
into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system
is designed to spatialize the auditory output from VoiceOver, the
built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability.
A between-groups experiment
was conducted to compare standard VoiceOver with the proposed
spatialized version. Non visually-impaired participants (n = 32),
with no visual access to the test interface, completed a list-based
exploration and then attempted to reconstruct the UI solely from
auditory cues. Experimental results indicate that the head-tracked
group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant
differences in self-reported workload or usability. These findings
suggest that potential benefits may come from the integration of
head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users
are needed.
Although the experimental testbed uses a generic
desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio
approach could benefit visually impaired producers and musicians
navigating plug-in controls.
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL,
and HAAQI —in the context of digital music production. Unlike
previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation
types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under
fixed-level, randomly fluctuating degradations.
In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic
effectively tracked degradation changes, while PEAQ Advanced
failed consistently and ViSQOL showed low sensitivity to hum
and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations
of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021),
PEAQ Basic (0.057), and PEAQ Advanced (0.065).
However,
ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity.
These findings highlight the strengths and limitations of each
metric for music production, specifically quality measurement with
compressed audio. The source code and dataset will be made publicly available upon publication.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced
feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference
and anchor (MUSHRA) test and a two-alternative-forced-choice
(2AFC) discrimination task have been conducted to compare the
proposed method against ground truth recordings and conventional
RT-based approaches. The results show that the proposed system
delivers robust performance in various scenarios, achieving highly
plausible reverberation synthesis.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation Reverberation is crucial in the acoustical design of physical
spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that
can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of
RESs strongly depends on the properties of the physical room and
the architecture of the Digital Signal Processor (DSP). However,
room-impulse-response (RIR) measurements and the DSP code
from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present
DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and
professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs.
The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like
pipeline. The replication of previous studies by the authors shows
that PyRES can become a useful tool in future research on RESs.
Download Auditory Discrimination of Early Reflections in Virtual Rooms This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual
reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position
reversal (Left–Right, Front–Back, and Floor–Ceiling) and three
reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near
chance. This result can be explained by the higher acuity and
lower localisation blur found for the human auditory system. A
two-way ANOVA confirmed a significant main effect of spatial
position (p = 0.00685, η² = 0.1605), with no significant effect of
reverberation or interaction. The analysis of the binaural room
impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning
with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.
Download Biquad Coefficients Optimization via Kolmogorov-Arnold Networks Conventional Deep Learning (DL) approaches to Infinite Impulse
Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and
limited accuracy. As an alternative, in this paper, we explore the
use of Kolmogorov-Arnold Networks (KANs) to predict the IIR
filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve
smooth coefficients’ optimization. Furthermore, by constraining
the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach
is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate
results. This offers new possibilities for integrating digital infinite
impulse response (IIR) filters into deep-learning frameworks.
Download Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values Audio effects and sound synthesizers are widely used processors
in popular music.
Their parameters control the quality of the
output sound. Multiple combinations of parameters can lead to
the same sound.
While recent approaches have been proposed
to estimate these parameters given only the output sound, those
are deterministic, i.e. they only estimate a single solution among
the many possible parameter configurations.
In this work, we
propose to model the parameters as probability distributions instead
of deterministic values. To learn the distributions, we optimize
two objectives: (1) we minimize the reconstruction error between
the ground truth output sound and the one generated using the
estimated parameters, asisit usuallydone, but also(2)we maximize
the parameter diversity, using entropy. We evaluate our approach
through two numerical audio experiments to show its effectiveness.
These results show how our approach effectively outputs multiple
combinations of parameters to match one sound.
Download A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education This article presents ZePolA, a digital audio equalizer designed
as an educational resource for understanding digital filter design.
Unlike conventional equalization plug-ins, which define the frequency response first and then derive the filter coefficients, this
software adopts an inverse approach: users directly manipulate the
placement of poles and zeros on the complex plane, with the corresponding frequency response visualized in real time. This methodology provides an intuitive link between theoretical filter concepts
and their practical application. The plug-in features three main
panels: a filter parameter panel, a frequency response panel, and a
filter design panel. It allows users to configure a cascade of firstor second-order filter elements, each parameterized by the location of its poles or zeros. The GUI supports interaction through
drag-and-drop gestures, enabling immediate visual and auditory
feedback. This hands-on approach is intended to enhance learning
by bridging the gap between theoretical knowledge and practical
application. To assess the educational value and usability of the
plug-in, a preliminary evaluation was conducted with focus groups
of students and lecturers. Future developments will include support for additional filter types and increased architectural flexibility. Moreover, a systematic validation study involving students
and educators is proposed to quantitatively evaluate the plug-in’s
impact on learning outcomes. This work contributes to the field
of digital signal processing education by offering an innovative
tool that merges the hands-on approach of music production with
a deeper theoretical understanding of digital filters, fostering an
interactive and engaging educational experience.
Download Zero-Phase Sound via Giant FFT Given the speedy computation of the FFT in current computer
hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio
signal can be obtained by zeroing the phase angle of its complex
spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to
enhance the resulting sound quality. As a result, a sound with the
same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are
retained, however. Zero-phase sounds are palindromic in the sense
that they are symmetric in time. A comparison of the zero-phase
conversion to the autocorrelation function helps to understand its
properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same
autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real
and imaginary parts of the spectrum to produce a stereo effect. A
frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is
another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.
Download Partiels – Exploring, Analyzing and Understanding Sounds This
article
presents
Partiels,
an
open-source
application
developed at IRCAM to analyze digital audio files and explore
sound characteristics.
The application uses Vamp plug-ins to
extract various information on different aspects of the sound, such
as spectrum, partials, pitch, tempo, text, and chords. Partiels is the
successor to AudioSculpt, offering a modern, flexible interface for
visualizing, editing, and exporting analysis results, addressing a
wide range of issues from musicological practice to sound creation
and signal processing research. The article describes Partiels’ key
features, including analysis organization, audio file management,
results visualization and editing, as well as data export and sharing
options, and its interoperability with other software such as Max
and Pure Data. In addition, it highlights the numerous analysis
plug-ins developed at IRCAM, based in particular on machine
learning models, as well as the IRCAM Vamp extension, which
overcomes certain limitations of the original Vamp format.