검색 상세

An Audio Event Detection Method Robust to Inaccurate Timestamps by Limiting Event Boundary Intervals

초록/요약

This thesis addresses the label noise issue in audio event detection (AED) by refining strong labels with inaccurate timestamps into sequential labels. In AED, the strong labels contain the occurrence of a specific event and its timestamps corresponding to the start and end of the event in an audio clip. The timestamps are very useful information for training a model, but label noise is inevitable because the boundaries of events are ambiguous or depend on the subjectivity of each annotator. To avoid performance degradation caused by the label noise, we propose an AED scheme to train with sequential labels in addition to given strong labels after converting the strong labels into the sequential labels. In particular, in order to fully exploit information from the available strong labels when calculating the sequential loss, we additionally propose a sequential loss calculation method that considers the error-prone time information. Since sequential labels have only sequence information refined from strong labels, the effect of the label noise is reduced by emphasizing the accurate information of the strong labels by using strong and sequential labels together. In addition, by limiting the frame interval, at which event boundaries can occur, with timestamps of the strong labels, we trained the model more efficiently. Experimental results on DCASE 2019 Task 4 demonstrated that the proposed method could successfully mitigate the label noise.

more