Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Infobox 3.1
Speech
(e) If the air travels through the mouth, a humped tongue and opening then closing lower jaw cause a vowel sound (e.g. /a/ in ‘card’), if the lower jaw does not close, a glide (e.g. /w/ in ‘won’) is the result. (f) Different sounds also result if the air is forced past the sides of a tongue touching the roof of the mouth or the teeth (e.g. /l/ in ‘luck’, and the /th/ sound). The above actions must be strung together by the speaker in order to construct coher- ent sentences. In practice, sounds will slur and merge into one another to some extent, such as the latter part of a vowel sound changing depending on the following sound. This can be illustrated by considering how the /o/ sound in ‘or’ and in ‘of’ differ. Infobox 3.1 The structure of speech A phoneme is the smallest structural unit of speech: there may be several of these comprising a single word. Usually we write phonemes between slashes to distinguish them, thus /t/ is the phoneme that ends the word ‘cat’. Phonemes often comprise distinctly recognisable phones which may vary widely to account for different spoken pronunciations. Two alternative pronunciations of a phoneme are usually the result of a choice between two phones that could be used within that phoneme. In such cases, the alternative phone pair are termed allophones. Interestingly, phones which are identical except in their spoken tone, can be called allotones, something which is very common in Mandarin Chinese, where many phonemes can be spoken with a choice of tone to totally change the meaning of a word. Single or clustered phonemes form units of sound organisation called syllables which generally allow a natural rhythm in speaking. Syllables usually contain some form of initial sound, followed by a nucleus and then a final. Both the initial and the final are optional, and if present are typically consonants, while the syllable nucleus is usually a vowel. Technically a vowel is a sound spoken with an open vocal tract as explained in Section 3.1, while a consonant is one spoken with a constricted, or partially constricted vocal tract, but as with many research areas, these definitions which are so clear and unambiguous on paper are blurred substantially in practice. The merging of phonemes and words together is one major difficulty in speech pro- cessing – especially in the field of continuous speech recognition. For simple, single syllable words, the obvious gaps in a waveform plot will correspond to demarcation points, but as the complexity of an utterance increases, these demarcations become less and less obvious, and often the noticeable gaps are mid-word rather than between words. These difficulties have led speech segmentation to being a flourishing research area (see also Section 7.5.4). Finally, spoken enunciation is context sensitive. When background noise is present we shout, during extreme quiet we whisper. This does not always hold true for com- munications channels: imagine a man in a quiet office telephoning his wife in a noisy shopping mall. The husband will naturally talk fairly quietly in order not to disturb his colleagues, but the wife will have to shout to be heard. Possibly the wife will ask the office-bound husband to speak up a little and the husband will then ask the wife to stop shouting – this example is related from personal experience. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling