An Audio-Visual Fusion Piano Transcription Approach Based on Strategy

Xianke Wang; Wei Xu; Juanting Liu; Weiming Yang; Wenqing Cheng
DAFx-2021 - Vienna (virtual)
Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.
Download