Applied Speech and Audio Processing: With matlab examples

Speech analysis and classiﬁcation

bet	123/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 119 120 121 122 123 124 125 126 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

6.2. Speech analysis and classiﬁcation
149
6.2.1
Pitch analysis
Section 5.3.2.1 discussed the use of long-term prediction to determine the pitch of a
speech waveform, a method commonly used in speech coders. There are, however,
many alternatives. The most accurate require some form of human involvement, or
a measurement device such as an accelerometer or electroglottograph (EGG) on the
throat or glottis producing the pitch. From a recorded speech waveform alone, it is
currently impossible to identify a deﬁnitive answer to the question ‘what is the pitch?’, but
nevertheless some algorithms appear to get close. Tests of these pitch analysis algorithms
tend to rely on consensus answers. As an example, many methods have been surveyed,
and compared, in an excellent paper by Rabiner et al. [6].
Some of the more mainstream techniques reported by Rabiner and others operating
purely on recorded sound include the following:
• time-domain zero-crossing analysis (perhaps thresholded – see Section 6.1.1);
• time-domain autocorrelation (the method used in Section 5.3.2.1);
• frequency-domain cepstral analysis (see Section 2.6.2.2);
• average magnitude difference function based methods (see Section 6.1.3);
• simpliﬁed inverse ﬁltering technique (SIFT);
• LPC and LSP-based methods (such as those in Section 6.1.6);
• time-frequency domain analysis – explained in Section 6.2.2.
We will not reproduce the excellent work of Rabiner here, but will introduce a modern
alternative, the use of time-frequency distribution analysis. Whilst being more complex
than most, if not all, of the other methods mentioned, early indications are that this
method of analysis is promising in terms of achieving better accuracy than the traditional
techniques.
6.2.2
Joint time-frequency distribution
Joint Time-Frequency Distribution (TFD) analysis originally emerged in the radar signal
processing ﬁeld, but has started to be adopted for speech processing in recent years [7].
Its good performance in tracing frequency transitions as time progresses has been noted
by speech researchers. Given the importance of pitch in speech systems, it is thus little
surprise that TFD analysis has been attempted for pitch determination.
We will discuss four joint time-frequency distributions, as described in [8], for use
in pitch determination. These are namely the Spectrogram Time-Frequency Distribution
(STFD), Wigner–Ville Distribution (WVD), Pseudo-Wigner–Ville Distribution (PWVD)
and Reassigned Smoothed Pseudo-Wigner–Ville Distribution (RSPWVD). They all at-
tempt to identify the characteristics of frequency as it changes with time from slightly
different points of view. Each one of these TFD algorithms has its own strengths and
weaknesses relating to the sensitivity to detect pitch features and the implementation
complexity (and these have been well established in research literature).
The STFD computes the spectrogram distribution of a discrete-time signal x. It cor-
responds to the squared modulus of the short-time Fourier transform (see Section 2.6).

150

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 119 120 121 122 123 124 125 126 ... 170