Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 6.1. Analysis toolkit
Audio analysis
Amplitude ZCR 1 0.5 0 –0.5 –1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.8 0 20 40 60 80 100 120 140 160 180 200 1 0.8 0.6 0.4 0.2 0 Analysis frame Figure 6.2 A 16 kHz sampled speech recording ‘its speech’, containing several stressed high frequency sibilants (plotted in the upper graph), and the ZCR measure corresponding to this (lower graph). The excursions of the ZCR plot follow the high frequency speech components, namely a breathy /i/ at the start, both /s/ sounds, and the final /ch/. Figure 6.3 Threshold-crossing rate illustrated for a noisy sinewave, showing that the extra zero crossings of Figure 6.1(b) are no longer present. In most ways, TCR results are very similar to ZCR. In the absence of noise, a TCR plot for a speech recording would resemble that for ZCR as in Figure 6.2. 6.1.2 Frame power This is a measure of the signal energy over an analysis frame, and is calculated as the sum of the squared magnitude of the sample values in that frame. For speech frame i, with N elements, denoted by x i ( ), the frame power measure is determined from: E i = 1 N N −1 n =0 |x i (n)| 2 . (6.2) As with the case of ZCR in Section 6.1.1, the division by N will often be unnecessary in practice. 6.1. Analysis toolkit 139 In Matlab, that is a very simple formula: function [fpow]=fpow(segment) fpow=sum(segment.ˆ2)/length(segment); Frame power provides a compact representation of the amplitude of the speech. As we have seen in Section 3.2.3, unvoiced speech is spoken with less power than voiced speech, and for this reason, frame power provides another indicator of voiced/unvoiced speech. The simplicity of the Matlab function above hides the fact that this calculation re- quires a multiplication to be performed for each and every sample within the analysis window. In implementation terms this can be relatively ‘expensive’, prompting simpli- fication efforts which led directly to the AMDF below. In fact the similarity of the two measures is illustrated in Figure 6.4, which plots the frame power and AMDF measures together for an example recording of the first seven letters of the alphabet. The speech was recorded with a 16 kHz sample rate, and analysis performed on non-overlapping 128-sample frames. Each of the plots is scaled to a maximum of 1.0 for comparison purposes. 6.1.3 Average magnitude difference function The average magnitude difference function is designed to provide much of the informa- tion of the frame power measure, but without multiplications: AMDF i = 1 N N −1 n =0 |x i (n)| . (6.3) In Matlab, it is again very simple: function [amdf]=amdf(segment) amdf=sum(abs(segment))/length(segment); An illustration of AMDF obtained for a sequence of recorded speech is shown in Figure 6.4. This also plots the frame power measure, and illustrates the quite close correspondence of the two measures. Both output high values when speech power is high (such as the /a/ sound in the letter A and both the /c/ and /ee/ sounds of the letter C) and output low measure results when speech power is low. Although correspondence between frame power and AMDF appears to be quite close in this plot, it should be noted that the AMDF output is higher than frame power during the gaps between words. This is an indicator that the AMDF may be less immune to confusion by noise than the frame power measure. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling