Applied Speech and Audio Processing: With matlab examples

bet	37/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 33 34 35 36 37 38 39 40 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Infobox 3.1

Speech
(e) If the air travels through the mouth, a humped tongue and opening then closing lower
jaw cause a vowel sound (e.g. /a/ in ‘card’), if the lower jaw does not close, a glide
(e.g. /w/ in ‘won’) is the result.
(f) Different sounds also result if the air is forced past the sides of a tongue touching the
roof of the mouth or the teeth (e.g. /l/ in ‘luck’, and the /th/ sound).
The above actions must be strung together by the speaker in order to construct coher-
ent sentences. In practice, sounds will slur and merge into one another to some extent,
such as the latter part of a vowel sound changing depending on the following sound.
This can be illustrated by considering how the /o/ sound in ‘or’ and in ‘of’ differ.
Infobox 3.1 The structure of speech
A phoneme is the smallest structural unit of speech: there may be several of these comprising
a single word. Usually we write phonemes between slashes to distinguish them, thus /t/ is the
phoneme that ends the word ‘cat’. Phonemes often comprise distinctly recognisable phones which
may vary widely to account for different spoken pronunciations.
Two alternative pronunciations of a phoneme are usually the result of a choice between two
phones that could be used within that phoneme. In such cases, the alternative phone pair are termed
allophones. Interestingly, phones which are identical except in their spoken tone, can be called
allotones, something which is very common in Mandarin Chinese, where many phonemes can be
spoken with a choice of tone to totally change the meaning of a word.
Single or clustered phonemes form units of sound organisation called syllables which generally
allow a natural rhythm in speaking. Syllables usually contain some form of initial sound, followed
by a nucleus and then a ﬁnal. Both the initial and the ﬁnal are optional, and if present are typically
consonants, while the syllable nucleus is usually a vowel.
Technically a vowel is a sound spoken with an open vocal tract as explained in Section 3.1,
while a consonant is one spoken with a constricted, or partially constricted vocal tract, but as with
many research areas, these deﬁnitions which are so clear and unambiguous on paper are blurred
substantially in practice.
The merging of phonemes and words together is one major difﬁculty in speech pro-
cessing – especially in the ﬁeld of continuous speech recognition. For simple, single
syllable words, the obvious gaps in a waveform plot will correspond to demarcation
points, but as the complexity of an utterance increases, these demarcations become less
and less obvious, and often the noticeable gaps are mid-word rather than between words.
These difﬁculties have led speech segmentation to being a ﬂourishing research area (see
also Section 7.5.4).
Finally, spoken enunciation is context sensitive. When background noise is present
we shout, during extreme quiet we whisper. This does not always hold true for com-
munications channels: imagine a man in a quiet ofﬁce telephoning his wife in a noisy
shopping mall. The husband will naturally talk fairly quietly in order not to disturb his
colleagues, but the wife will have to shout to be heard. Possibly the wife will ask the
ofﬁce-bound husband to speak up a little and the husband will then ask the wife to stop
shouting – this example is related from personal experience.

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 33 34 35 36 37 38 39 40 ... 170