The Importance Of Text-To Speech Synthesis

823 Words2 Pages

Introduction : In this chapter we take a close look at two important issues in text-to-speech synthesis, namely, prosody modeling and waveform generation, and present a review of popular techniques for the same. These two steps are important for generation of natural sounding speech. At the perceptual level, naturalness in speech is attributed to certain properties of the speech signal related to audible changes in pitch, loudness and syllabic length, collectively called prosody. Acoustically, these changes correspond to the variations in the fundamental frequency(F0), amplitude and duration of speech units [2, 4]. Prosody is important for speech synthesis because it conveys aspects of meaning and structure that are not implicit in the segmental …show more content…

As an example, in the phrase “I feel great about it, we should celebrate!” there are two main prosodic phrases with the boundary at the punctuation mark. Prosodic phrasing involves finding these types of meaningful prosodic phrases, which may or may not be explicit. Prosodic phrasing is important because it not only increases the understandability of synthesized speech but also helps ascribe meaning to certain parts of the synthesized speech, in the same way humans do, by varying prosody. This is done by creating prosodic boundaries at explicit identifiers like punctuation marks, or at certain lexical or grammatical words, known to be phrase delimiters for a language. Several researchers have tried using variants of context-free grammars like augmented transition networks (ATNs), definite-clause grammars (DCGs) and unification grammars (UGs) to model syntactic-prosodic structures in languages and use them to identify the prosodic phrases in the input text [29–31]. Another approach is to use statistical models, with probabilistic predictors like CART decision trees, to predict prosodic phrases based on features such as the parts of speech of the surrounding words, the length of an utterance in number of words, the distance of a potential boundary from the beginning or the end of an utterance and whether surrounding words are accented [32] Even though the rules based on punctuation are good predictors of prosodic phrases, there are many cases where explicit punctuation marks are not present to indicate the phrase boundaries. This problem is prominent in the case of Indian languages where there is little or no use of punctuation marks. Sridhar [33] uses an elementary deterministic rule based phrasing model for Hindi using the content/function word

Open Document