Inter-program audio level discrepancies continue to plague the content creation and broadcast industries. In particular, end users often are compelled to make adjustments to audio playback levels. It has previously been established that leveling programs based on dialogue loudness can improve listener satisfaction; however, it is difficult to measure as it requires a human to continuously monitor an audio stream and measure only the loudness during speech portions of the content. This paper presents an automated speech discrimination system that provides a means to detect portions of audio content that contain primarily speech. The speech/other discrimination system takes advantage of well known speech characteristics to achieve a total error of 3% despite delay and computational limitations.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.