This paper proposes an algorithm for aligning singing in polyphonic music audio with textual lyrics. As preprocessing, the system uses a voice separation algorithm based on melody transcription and sinusoidal modeling. The alignment is based on a hidden Markov model speech recognizer where the acoustic model is adapted to singing voice. The textual input is preprocessed to create a language model consisting of a sequence of phonemes, pauses and possible instrumental breaks. Viterbi algorithm is used to align the audio features with the text. On a test set consisting of 17 commercial recordings, the system achieves an average absolute error of 1.40 seconds in aligning lines of the lyrics.