A High-Level Audio Feature for Music Retrieval and Sorting

Tim Pohle; Peter Knees; Klaus Seyerlehner; Gerhard Widmer
DAFx-2010 - Graz
We describe an audio analysis method to create a high-level audio annotation, expressed as a single scalar. Typically, low values of this feature indicate songs with dominant harmonic elements while high values indicate the dominance of mainly percussive or drum-like sounds. The proposed feature is based on a simple idea: Filters known from image processing are used to extract attack and harmonic parts of the spectrum, and the ratio of their overall strengths is used as the final feature. The feature takes values in the unit range, and is highly independent of the overall loudness. We present a number of experiments that indicate the potential of the proposed feature. A suggested application scenario is to write the feature value into the comments field of an audio file, so that it can be used by a number of existing audio players in conjunction with metadata-based search mechanisms, most notably genre.