Applied Speech and Audio Processing: With matlab examples

bet	115/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 111 112 113 114 115 116 117 118 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

6.1. Analysis toolkit

Audio analysis
Amplitude
ZCR
1
0.5
0
–0.5
–1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.8
0
20
40
60
80
100
120
140
160
180
200
1
0.8
0.6
0.4
0.2
0
Analysis frame
Figure 6.2
A 16 kHz sampled speech recording ‘its speech’, containing several stressed high
frequency sibilants (plotted in the upper graph), and the ZCR measure corresponding to
this (lower graph). The excursions of the ZCR plot follow the high frequency speech
components, namely a breathy /i/ at the start, both /s/ sounds, and the ﬁnal /ch/.
Figure 6.3
Threshold-crossing rate illustrated for a noisy sinewave, showing that the extra zero
crossings of Figure 6.1(b) are no longer present.
In most ways, TCR results are very similar to ZCR. In the absence of noise, a TCR
plot for a speech recording would resemble that for ZCR as in Figure 6.2.
6.1.2
Frame power
This is a measure of the signal energy over an analysis frame, and is calculated as the
sum of the squared magnitude of the sample values in that frame. For speech frame i,
with N elements, denoted by x
i
( ), the frame power measure is determined from:
E
i
=
1
N
N
−1

n
=0
|x
i
(n)|
2
.
(6.2)
As with the case of ZCR in Section 6.1.1, the division by N will often be unnecessary
in practice.

6.1. Analysis toolkit
139
In Matlab, that is a very simple formula:
function [fpow]=fpow(segment)
fpow=sum(segment.ˆ2)/length(segment);
Frame power provides a compact representation of the amplitude of the speech. As we
have seen in Section 3.2.3, unvoiced speech is spoken with less power than voiced
speech, and for this reason, frame power provides another indicator of voiced/unvoiced
speech.
The simplicity of the Matlab function above hides the fact that this calculation re-
quires a multiplication to be performed for each and every sample within the analysis
window. In implementation terms this can be relatively ‘expensive’, prompting simpli-
ﬁcation efforts which led directly to the AMDF below. In fact the similarity of the two
measures is illustrated in Figure 6.4, which plots the frame power and AMDF measures
together for an example recording of the ﬁrst seven letters of the alphabet. The speech
was recorded with a 16 kHz sample rate, and analysis performed on non-overlapping
128-sample frames. Each of the plots is scaled to a maximum of 1.0 for comparison
purposes.
6.1.3
Average magnitude difference function
The average magnitude difference function is designed to provide much of the informa-
tion of the frame power measure, but without multiplications:
AMDF
i
=
1
N
N
−1

n
=0
|x
i
(n)| .
(6.3)
In Matlab, it is again very simple:
function [amdf]=amdf(segment)
amdf=sum(abs(segment))/length(segment);
An illustration of AMDF obtained for a sequence of recorded speech is shown in
Figure 6.4. This also plots the frame power measure, and illustrates the quite close
correspondence of the two measures. Both output high values when speech power is
high (such as the /a/ sound in the letter A and both the /c/ and /ee/ sounds of the letter C)
and output low measure results when speech power is low.
Although correspondence between frame power and AMDF appears to be quite close
in this plot, it should be noted that the AMDF output is higher than frame power during
the gaps between words. This is an indicator that the AMDF may be less immune to
confusion by noise than the frame power measure.

140

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 111 112 113 114 115 116 117 118 ... 170