Automatic alignment of music audio and lyrics

Annamaria Mesaros; Tuomas Virtanen

Automatic alignment of music audio and lyrics

DAFx-2008 - Espoo

This paper proposes an algorithm for aligning singing in polyphonic music audio with textual lyrics. As preprocessing, the system uses a voice separation algorithm based on melody transcription and sinusoidal modeling. The alignment is based on a hidden Markov model speech recognizer where the acoustic model is adapted to singing voice. The textual input is preprocessed to create a language model consisting of a sequence of phonemes, pauses and possible instrumental breaks. Viterbi algorithm is used to align the audio features with the text. On a test set consisting of 17 commercial recordings, the system achieves an average absolute error of 1.40 seconds in aligning lines of the lyrics.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Browse by year

Automatic alignment of music audio and lyrics