Following the resurgence of machine learning within the context of autonomous driving, the need for acquiring and labeling data expanded by folds. Despite the large amount of available visual data (images, point clouds, . . . ), researchers apply augmentation techniques to extend the training dataset, which improves the classification accuracy. When trying to exploit audio data for autonomous driving, two challenges immediately surfaced: first, the lack of available data and second, the absence of augmentation techniques. In this paper we introduce a series of augmentation techniques suitable for audio data. We apply several procedures, inspired by data augmentation for image classification, that transform and distort the original data to produce similar effects on sound. We show the increase in overall accuracy of our neural network for sound classification by comparing it to the non-augmented version.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.