Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
6.1. Analysis toolkit
137 This is illustrated in Figure 6.1(a) where the zero crossings of a sinewave are counted over a certain analysis time. In this case the fundamental frequency of the sinewave causes nine crossings across the plot. However in the presence of additive noise, the ‘wobble’ in the signal as it crosses the zero-axis causes several false counts. In Figure 6.1(b) this leads to an erroneous estimate of signal fundamental frequency – in fact an estimate that would be three times too high. In Matlab, determining the ZCR is relatively easy, although not particularly elegant: function [zcr]=zcr(segment) zc=0; for m=1:length(segment)-1 if segment(m)*segment(m+1) > 0 zc=zc+0; else zc=zc+1; end zcr=zc/length(segment); end To illustrate the Matlab zcr() function above was applied to a recording of speech. The speech was segmented into non-overlapping analysis windows of size 128 samples, and the ZCR determined for each window. The results, plotted in Figure 6.2, show a good correspondence between the ZCR measure and the frequencies present in the speech – higher frequency regions of the recorded speech, such as the /ch/ sound, have a higher ZCR measure. A pragmatic solution to the problem of noise is to apply a threshold about the zero- axis. In essence, this introduces a region of hysteresis whereby a single count is made only when the signal drops below the maximum threshold and emerges below the min- imum threshold, or vice versa. This is called threshold-crossing rate (TCR), and is illustrated in Figure 6.3. In practice, the advantage of TCR for noise reduction is often achieved by low-pass filtering the speech before a ZCR is calculated. This knocks out the high frequency noise or ‘bounce’ on the signal. Since ZCR is used as a rough approximation of the fundamental pitch of an audio signal, bounds for filtering can be established through knowing the extent of the expected maximum. In speech, it has to be stressed that the filtered ZCR (or TCR) measure provides an approximate indication of the content of the speech signal, with unvoiced speech tending to result in high ZCR values, and voiced speech tending to result in low ZCR values. Noise also tends to produce high ZCR values, and thus it is difficult to use ZCR for analysis of noisy speech, significantly limiting its use in practical applications. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling