Synthesising Prosody with Variable Resolution
This paper highlights some of the challenges involved in predicting the spatial reproduction performance of surround sound systems serving large and acoustically live listening areas and highlights the shortcomings of current objective assessment methods. This paper presents a technique for synthesising prosody based upon information extracted from spoken utterances. We are interested in designing systems that learn how to speak autonomously, by interacting with humans. Our motivation for an in-depth investigation on prosody is prompted by the fact that infants seem to have acute prosodic listening during the first months of life. We presume that any system aimed at learning some form of speaking skills should display this fundamental capacity. This paper addresses two fundamental components for the development of such systems: prosody listening and prosody production. It begins with a brief introduction to the problem within the context of our research objectives. Then it introduces the system and presents some commented examples. The paper concludes with final remarks and a brief discussion on future developments.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is temporarily free for AES members.