Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 7.3. Speaker classification
Advanced topics
Figure 7.3 Comparison of several critical-band spreading functions from various authors. The models are applied on a per-band basis. The centre frequency of each critical band is used as a reference, and sound falling in that critical band will have a masking effect on other sounds as represented by the function. For example, using the Hermansky model, a sound within the critical band plotted will mask a sound 0.8 Bark lower that has amplitude 0.2 times or less. 7.2 Perceptual weighting The perceptual error weighting filter (PEWF) is a common sight within speech coders. In this context it has a particular meaning which may not be quite the same as its interpretation elsewhere. Based on its context of speech coding, it makes use of linear prediction parameters, which themselves encode vocal tract information. It uses these parameters to ‘strengthen’ resonances, and thus to increase formant power in encoded speech. The idea is that, since the formant regions are more relevant to human perception, the weighting process improves the perception of these. Whilst this argument is true of speech, for general music systems, perceptual weighting more often involves either the use of a perceptual model, or simply the application of a digital version of the A- weighting filter. Here we will present a typical PEWF as found within a CELP speech coder. The LPC synthesis filter is termed H (z), and two bandwidth expansion factors are used, ζ 1 and ζ 2 with the relationship ζ 1 < ζ 2 ≤ 1. Then the weighting filter W (z) is defined as: 7.3. Speaker classification 169 W (z) = 1 − H(z/ζ 1 ) 1 − H(z/ζ 2 ) . (7.6) Remembering that the LPC synthesis filter is defined as: H (z) = P k =1 a k z −k (7.7) then the frequency scaled version will be: H (z/ζ) = P k =1 ζ k a k z −k . (7.8) Taking Equation (7.8) in difference form, and substituting into Equation (7.6), the PEWF is quite simply realised in discrete terms as: y [n] = x[n] + P k =1 a k {ζ k 2 y [n − k] − ζ k 1 x [n − k]}. (7.9) This can be applied at the output of a CELP speech decoder to slightly enhance the intelligibility of voiced speech, and also be used within the CELP encoder to accen- tuate the importance of any formant regions within the codebook search loop (i.e. to weight the mean-squared matching process toward any formants that may be present). Typically the value of ζ 1 and ζ 2 is very close to unity. In the past the author has used ζ 1 = 0.95 and ζ 2 = 1.0 for several systems. Download 2.66 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling