Applied Speech and Audio Processing: With matlab examples
Speech analysis and classification
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
6.2. Speech analysis and classification
149 6.2.1 Pitch analysis Section 5.3.2.1 discussed the use of long-term prediction to determine the pitch of a speech waveform, a method commonly used in speech coders. There are, however, many alternatives. The most accurate require some form of human involvement, or a measurement device such as an accelerometer or electroglottograph (EGG) on the throat or glottis producing the pitch. From a recorded speech waveform alone, it is currently impossible to identify a definitive answer to the question ‘what is the pitch?’, but nevertheless some algorithms appear to get close. Tests of these pitch analysis algorithms tend to rely on consensus answers. As an example, many methods have been surveyed, and compared, in an excellent paper by Rabiner et al. [6]. Some of the more mainstream techniques reported by Rabiner and others operating purely on recorded sound include the following: • time-domain zero-crossing analysis (perhaps thresholded – see Section 6.1.1); • time-domain autocorrelation (the method used in Section 5.3.2.1); • frequency-domain cepstral analysis (see Section 2.6.2.2); • average magnitude difference function based methods (see Section 6.1.3); • simplified inverse filtering technique (SIFT); • LPC and LSP-based methods (such as those in Section 6.1.6); • time-frequency domain analysis – explained in Section 6.2.2. We will not reproduce the excellent work of Rabiner here, but will introduce a modern alternative, the use of time-frequency distribution analysis. Whilst being more complex than most, if not all, of the other methods mentioned, early indications are that this method of analysis is promising in terms of achieving better accuracy than the traditional techniques. 6.2.2 Joint time-frequency distribution Joint Time-Frequency Distribution (TFD) analysis originally emerged in the radar signal processing field, but has started to be adopted for speech processing in recent years [7]. Its good performance in tracing frequency transitions as time progresses has been noted by speech researchers. Given the importance of pitch in speech systems, it is thus little surprise that TFD analysis has been attempted for pitch determination. We will discuss four joint time-frequency distributions, as described in [8], for use in pitch determination. These are namely the Spectrogram Time-Frequency Distribution (STFD), Wigner–Ville Distribution (WVD), Pseudo-Wigner–Ville Distribution (PWVD) and Reassigned Smoothed Pseudo-Wigner–Ville Distribution (RSPWVD). They all at- tempt to identify the characteristics of frequency as it changes with time from slightly different points of view. Each one of these TFD algorithms has its own strengths and weaknesses relating to the sensitivity to detect pitch features and the implementation complexity (and these have been well established in research literature). The STFD computes the spectrogram distribution of a discrete-time signal x. It cor- responds to the squared modulus of the short-time Fourier transform (see Section 2.6). |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling