Applied Speech and Audio Processing: With matlab examples
Characteristics of speech
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Infobox 3.2
3.2. Characteristics of speech
41 3.2 Characteristics of speech Despite many differences between individuals, and the existence of many languages, speech follows general patterns, and on average has well defined characteristics such as those of volume, frequency distribution, pitch rate and syllabic rate [3]. These character- istics have adapted with regard to environment, hearing and voice production limitations – speech characteristics fit the speech generating abilities of the body – but the rapid changes in society over the past century have exceeded our ability to adapt. The shouting mechanism for ‘long distance’ communications, for example, across an open valley, is not particularly suited to inner-city conditions (just stand outside a tower block for a few minutes on a hot day when windows are open and hear the examples of inappropriately loud vocal communications). On the other hand, rail or bus commuters will seldom have the opportunity to converse in whispers. Infobox 3.2 The International Phonetic Alphabet The International Phonetic Alphabet (IPA) is the usual method of describing and writing the various phonemes that make up speech. As defined by the International Phonetic Association, a set of symbols, written using a shorthand notation, describes the basic sound units of words. These symbols can completely describe many different languages using the 107 letters and several diacritical marks available [3]. It is beyond the scope of this book to introduce this alphabet, but simply to point out that researchers working with phonetics would be advised to learn the IPA and apply this notation in their work to avoid misconceptions and insufficiently specified speech sounds. 3.2.1 Speech classification Physically, the sounds of speech can be described in terms of a pitch contour and formant frequencies. In fact this description forms a method of analysis used by most speech compression algorithms (discussed in Section 5.2 and beyond). Formants are resonant frequencies of the vocal tract which appear in the speech spectrum as clear peaks. As an example, three distinct formant peaks can be seen in the frequency domain plot of a short speech recording, in Figure 3.2. Formants have been described by the famous researcher Klatt and others as being the single most important feature in speech communications [4]. Generally many formants will be present in a typical utterance, and the location of these will vary over time as the shape of the mouth changes. Formants are counted from the lowest frequency upwards, and usually only the first three (F1, F2 and F3) contribute significantly to the intelligibility of speech. Some fricative sounds like /ch/ can produce a lot of formants, but generally speaking F1 contains most of the speech energy while F2 and F3 between them contribute more to speech intelligibility [5]. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling