Non-Parallel Singing-Voice Conversion by Phoneme-based Mapping and Covariance Approximation

Fernando Villavicencio; Hideki Kenmochi
DAFx-2011 - Paris
In this work we present an approach to perform voice timbre conversion from unpaired data. Voice Conversion strategies are commonly restricted to the use of parallel speech corpora. Our proposition is based on two main concepts: the modeling of the timbre space based on phonetic information and a simple approximation of the cross-covariance of source-target features. The experimental results based on the mentioned strategy in singing-voice data of the VOCALOID synthesizer showed a conversion performance comparable to that obtained by Maximum-Likelihood, thereby allowing us to achieve singer-timbre conversion from real singing performances.
Download