Superimpositional Model Of Intonation

827 Words2 Pages

Pitch modeling : Once the prosodic boundaries are identified, the speech synthesizer applies the prosody elements namely, duration, intonation and intensity, on each of the phrases and on the sentence as a whole. The primary factors that contribute to the intonation are the context of words and the intended meaning of sentences. Jurafsky [4] explains this with the following example. Consider the utterance “oh, really”. Without varying the phrasing or stress, it is still possible to have many variants of this by varying the intonation. For example, we might have an excited version “oh, really!” (in the context of a reply to a statement that one has won the lottery), a skeptical version “oh, really?” in the context of not being sure whether the …show more content…

Intonation is also influenced by the gender, physical state, emotional state and attitude of the speaker. There are two approaches for automatic generation of pitch patterns for synthetic speech. The superpositional approach considers an F0 contour as consisting of two or more superimposed components [34]. In this approach, the generated F0 contour is the sum of a global component that represents the intonation of the whole utterance and the local components that model the change of F0 over the accented syllables. The second approach, called as a linear approach considers an F0 contour as a linear succession of tones. An example of the linear approach to pitch modeling is the Pierrehumbert or ToBI model that describes a pitch contour in terms of the pitch accents [35]. Pitch accents occur at stressed syllables and form characteristic patterns in the pitch contour. The ToBI model for English uses five pitch accents obtained by combining two simple tones, high (H) and low (L) in different ways. The model uses a H+L pattern to indicate a fall, a L+H pattern to describes a rise and an asterisk (*) to indicate which tone falls on a stressed syllable. The five pitch accents are …show more content…

In one approach (O’Shaughnessy [36], Bartkova and Sorin [37]), the intrinsic duration of a speech unit is modified by successively applying rules derived from analysis of speech data. Bartkova and Sorin [37] have analyzed several corpora to study speaker independent intrinsic durations and their modifications to come up with multiplicative rules and factors to modify an assigned baseline duration. In another approach large speech corpora are first analyzed by varying a number of possible control factors simultaneously to obtain duration models, such as an additive duration model by Kaiki [38], CARTs by Riley [3] and neural networks by Campbell [39]. The CARTs (classification and regression trees) proposed by Riley are data-driven models constructed automatically with the capability of self-configuration. The CART algorithm sorts instances in the learning data using binary yes/no questions about the attributes that the instances have. Starting at a root node, the CART algorithm builds a tree structure, selecting the best attribute and question to be asked at each node, in the process. The selection is based on what attribute and question will divide the learning data to give the

Open Document