Audio Visualization via Delay Embedding and Subspace Learning

Alois Cerbu; Carmine Cella
DAFx-2024 - Guildford
We describe a sequence of methods for producing videos from audio signals. Our visualizations capture perceptual features like harmonicity and brightness: they produce stable images from periodic sounds and slowly-evolving images from inharmonic ones; they associate jagged shapes to brighter sounds and rounded shapes to darker ones. We interpret our methods as adaptive FIR filterbanks and show how, for larger values of the complexity parameters, we can perform accurate frequency detection without the Fourier transform. Attached to the paper is a code repository containing the Jupyter notebook used to generate the images and videos cited. We also provide code for a realtime C++ implementation of the simplest visualization method. We discuss the mathematical theory of our methods in the two appendices.
Download