Applied Speech and Audio Processing: With matlab examples

Characteristics of speech

bet	42/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 38 39 40 41 42 43 44 45 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Speech power

3.2. Characteristics of speech
45
Finally, replace all vowels with the same phoneme and read again:
Tha yallaw dag had ﬂaas.
Apart from utterly humiliating such sceptics by making them sound stupid, it is imme-
diately obvious that although the same-vowel sentence sounds odd, it is still highly
intelligible. By contrast the same-consonant sentence is utterly unintelligible. This simple
example illustrates that although vowels are spoken louder, they tend to convey less
intelligibility than the quieter consonants.
3.2.4
Frequency distribution
The frequency distribution of speech follows fairly closely to the sensitivity of the human
ear: most of the frequencies involved in speech, and certainly all of those that convey
signiﬁcant intelligibility, lie within the range of frequencies over which the ear is most
sensitive. However within this band (about 300 Hz to 4 kHz) there is a mismatch between
the speech frequencies of greatest energy, and those of greatest intelligibility. Put another
way, the speech frequencies with the greatest concentration of power are not quite the
same as those that account for most transmitted intelligibility [12] – this disparity is
hinted at by the vowel/consonant difference in the previous section. To examine further,
let us now consider both power and intelligibility separately.
Speech power: Most of the energy transmitted in speech is based at low frequencies,
approximately 500 Hz for males and 800 Hz for females. These frequencies are not
essential for intelligibility – experiments in which these frequencies are removed indicate
that the remaining speech, whilst quiet and unusual in sound, can still be perfectly
intelligible. That is to say, the spoken information in the speech remains, whereas by
contrast the ability to recognise the speaker is severely impaired. Typically around 84%
of the energy in speech is located below 1 kHz as shown in Figure 3.3 constructed from
data provided in [6] and [13].
Figure 3.3 also contains bands indicating the ranges where the ﬁrst three formants
usually lie. Note the correspondence between F1 and the band of greatest energy distri-
bution.

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 38 39 40 41 42 43 44 45 ... 170