This paper analyzes the pitch fluctuations of different notes in Taiwanese singing in order to build an F0 note-type based control model that improves the naturalness of Taiwanese synthesized singing voice by producing the more natural F0 contours. The factors that significantly differentiate singing synthesis from speech synthesis must be taken into consideration when designing a singing synthesizer. Among these, the fundamental frequency (F0) contour is an important feature that deeply affects singing voice perception and needs to be controlled precisely. The F0 contour contains fluctuations instead of a predefined stepwise pitch curve derived from musical notes. These fluctuations are important features that should be taken into consideration in singing-related applications such as singing synthesis, singing voice detection, performance analysis, singing/music recognition, singing style identification, and query-by humming. Overshoot percentage and preparation percentage are proposed to solve the problems of determining the fluctuation extent. Statistics for each note category were established from a corpus of Taiwanese nursery rhymes. Different extents of the overshoot and preparation of separate categories of notes for males, females, and children were modeled according to the statistic results. A PID controller that controls a second-order system is proposed to quickly adjust to the correct F0 level of notes and remain sufficiently steady at the correct F0 level to produce a pleasant singing voice.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.