Applied Speech and Audio Processing: With matlab examples

bet	43/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 39 40 41 42 43 44 45 46 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Speech Figure 3.3
3.3. Speech understanding

Intelligibility: Although this will be discussed more fully in the next section, it is worth
noting here that most of the information (in terms of what was said) transmitted by speech
lies above 1 kHz, carried by formants F2 and F3 as mentioned previously. Removal
of all speech energy between 800 Hz and 3 kHz would leave a signal that sounded
like speech but which was completely unintelligible [14]. This effect is illustrated in
Figure 3.4.

46
Speech
Figure 3.3
Long-time averaged speech power distribution plotted against frequency, with the
approximate regions of the ﬁrst three formants identiﬁed through vertical grey bands.
Figure 3.4
Effect of limiting speech frequency range on the intelligibility of speech syllables,
measured as articulation index.
An analysis of the ﬁgure reveals that if a speech signal were low-pass ﬁltered at 1 kHz,
around 25% of speech syllables would be recognisable. If it were high-pass ﬁltered at
2 kHz, around 70% would be recognisable.
3.2.5
Temporal distribution
Temporally-wise, the major constraint on speech is how fast the brain and vocal apparatus
can attempt to articulate phonemes or syllables. The various muscles involved in vocal

3.3. Speech understanding
47
production can only move so fast, as can the muscles controlling the lungs. A further
constraint on lung muscle movement is the need for regular lung re-ﬁlling required to
prevent asphyxia.
Evidence suggests that the speed of articulation is mostly independent of the rate of
speaking. Even when speaking more quickly, most people will use the same length of
time to articulate a particular syllable, but will reduce the length of the gaps between
syllables and words [4]. This uniformity does, of course, greatly assist in the artiﬁcial
description and modelling of speech.
Of all the constraints on speech, the speed at which the muscles are capable of moving is
the most useful to us [15]. It allows speech to be deﬁned as semi-stationary over periods
of about 20 ms, meaning that speech analysis (including short-time Fourier analysis,
linear predictive analysis and pitch detection) is usually conducted over such a duration.
This is termed pseudo-stationarity (see Section 2.5.1).
One further temporal generalisation is that of syllabic rate, the rate at which syllables
are articulated. For most languages this remains fairly constant, and varies only slightly
between individuals and types of speech [6].
One or more agglomerated phonemes can make up a syllable sound. For humans, it
seems that the simplest unit of recognition may be the syllable, whereas the phoneme
is a distinction generally made by speech researchers aiming to determine a set of basic
building blocks for speech sounds. Phoneme duration varies from language-to-language,
speaker-to-speaker and differs depending upon the exact phoneme. There is even evi-
dence to tie phoneme length to word stress – louder, and more emphasised phonemes
tend to exhibit a longer duration.

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 39 40 41 42 43 44 45 46 ... 170