Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Speech Figure 3.3
- 3.3. Speech understanding
Intelligibility: Although this will be discussed more fully in the next section, it is worth
noting here that most of the information (in terms of what was said) transmitted by speech lies above 1 kHz, carried by formants F2 and F3 as mentioned previously. Removal of all speech energy between 800 Hz and 3 kHz would leave a signal that sounded like speech but which was completely unintelligible [14]. This effect is illustrated in Figure 3.4. 46 Speech Figure 3.3 Long-time averaged speech power distribution plotted against frequency, with the approximate regions of the first three formants identified through vertical grey bands. Figure 3.4 Effect of limiting speech frequency range on the intelligibility of speech syllables, measured as articulation index. An analysis of the figure reveals that if a speech signal were low-pass filtered at 1 kHz, around 25% of speech syllables would be recognisable. If it were high-pass filtered at 2 kHz, around 70% would be recognisable. 3.2.5 Temporal distribution Temporally-wise, the major constraint on speech is how fast the brain and vocal apparatus can attempt to articulate phonemes or syllables. The various muscles involved in vocal 3.3. Speech understanding 47 production can only move so fast, as can the muscles controlling the lungs. A further constraint on lung muscle movement is the need for regular lung re-filling required to prevent asphyxia. Evidence suggests that the speed of articulation is mostly independent of the rate of speaking. Even when speaking more quickly, most people will use the same length of time to articulate a particular syllable, but will reduce the length of the gaps between syllables and words [4]. This uniformity does, of course, greatly assist in the artificial description and modelling of speech. Of all the constraints on speech, the speed at which the muscles are capable of moving is the most useful to us [15]. It allows speech to be defined as semi-stationary over periods of about 20 ms, meaning that speech analysis (including short-time Fourier analysis, linear predictive analysis and pitch detection) is usually conducted over such a duration. This is termed pseudo-stationarity (see Section 2.5.1). One further temporal generalisation is that of syllabic rate, the rate at which syllables are articulated. For most languages this remains fairly constant, and varies only slightly between individuals and types of speech [6]. One or more agglomerated phonemes can make up a syllable sound. For humans, it seems that the simplest unit of recognition may be the syllable, whereas the phoneme is a distinction generally made by speech researchers aiming to determine a set of basic building blocks for speech sounds. Phoneme duration varies from language-to-language, speaker-to-speaker and differs depending upon the exact phoneme. There is even evi- dence to tie phoneme length to word stress – louder, and more emphasised phonemes tend to exhibit a longer duration. Download 2.66 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling