SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications
Machine learning for sound generation is rapidly expanding within
the computer music community. However, most datasets used to
train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant
limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes.
To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset,
distributed under Creative Commons licenses, features annotations
combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.