Download Transition-Aware: A More Robust Approach for Piano Transcription Piano transcription is a classic problem in music information retrieval. More and more transcription methods based on deep learning have been proposed in recent years. In 2019, Google Brain
published a larger piano transcription dataset, MAESTRO. On this
dataset, Onsets and Frames transcription approach proposed by
Hawthorne achieved a stunning onset F1 score of 94.73%. Unlike
the annotation method of Onsets and Frames, Transition-aware
model presented in this paper annotates the attack process of piano
signals called atack transition in multiple frames, instead of only
marking the onset frame. In this way, the piano signals around
onset time are taken into account, enabling the detection of piano onset more stable and robust. Transition-aware achieves a
higher transcription F1 score than Onsets and Frames on MAESTRO dataset and MAPS dataset, reducing many extra note detection errors. This indicates that Transition-aware approach has
better generalization ability on different datasets.
Download An Audio-Visual Fusion Piano Transcription Approach Based on Strategy Piano transcription is a fundamental problem in the field of music
information retrieval. At present, a large number of transcriptional
studies are mainly based on audio or video, yet there is a small
number of discussion based on audio-visual fusion. In this paper,
a piano transcription model based on strategy fusion is proposed,
in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used
for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1
score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion.
The experiment results show that the transcription model based on
strategy fusion achieves better results than the one based on feature
fusion.
Download DDSP-Based Neural Waveform Synthesis of Polyphonic Guitar Performance From String-Wise MIDI Input We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples and code are available.