Onset-Informed Source Separation Using Non-Negative Matrix Factorization With Binary Masks

Yuta Kusaka; Katsutoshi Itoyama; Kenji Nishida; Kazuhiro Nakadai
DAFx-2020 - Vienna (virtual)
This paper describes a new onset-informed source separation method based on non-negative matrix factorization (NMF) with binary masks. Many previous approaches to separate a target instrument sound from polyphonic music have used side-information of the target that is time-consuming to prepare. The proposed method leverages the onsets of the target instrument sound to facilitate separation. Onsets are useful information that users can easily generate by tapping while listening to the target in music. To utilize onsets in NMF-based sound source separation, we introduce binary masks that represent on/off states of the target sound. Binary masks are formulated as Markov chains based on continuity of musical instrument sound. Owing to the binary masks, onsets can be handled as a time frame in which the binary masks change from off to on state. The proposed model is inferred by Gibbs sampling, in which the target sound source can be sampled efficiently by using its onsets. We conducted experiments to separate the target melody instrument from recorded polyphonic music. Separation results showed about 2 to 10 dB improvement in target source to residual noise ratio compared to the polyphonic sound. When some onsets were missed or deviated, the method is still effective for target sound source separation.