A speech classification system is proposed which has applications for accessibility of content for younger children. To allow a young child to access online content (where typical interfaces such as search engines or hierarchical navigation would be inappropriate) we propose a voice classification system trained to recognise a range of sounds and vocabulary typical of younger children. As an example we design a system for classifying animal noises. Acoustic features are extracted from a corpus of animal noises made by a class of young children. A Support Vector Machine is trained to classify the sounds into one of 12 corresponding animals. We investigate the precision and recall of the classifier for various classification parameters. We investigate an appropriate choice of features to extract from the audio and compare the performance when using mean Mel-frequency Cepstral Coefficients (MFCC), a single-Gaussian model fitted to the MFCCs as well as a range of temporal features. To investigate the real-world applicability of the system we pay particular attention to the difference between training a generic classifier from a collected corpus of examples and one trained to a particular voice.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.