Applied Speech and Audio Processing: With matlab examples

bet	120/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 116 117 118 119 120 121 122 123 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

6.1. Analysis toolkit

Audio analysis
Figure 6.8
Deviation, bias and shift LSP analysis features collected for a 16 kHz sampled speech
utterance, with waveform plotted on top, extracted from the TIMIT database.
In Matlab, this is again easy, assuming the p
= 10 we would simply create the reference
LSPs using:
bar_w=[1:10]*pi/11
With the p lines distributed evenly across the spectrum, if this were transformed into
LPC coefﬁcients and their power spectrum calculated, it would be ﬂat. Dev thus deter-
mines how close each frame is to this distribution, such that with
β = 2, it becomes the
Euclidean distance between the actual and comparison distributions. Odd values such
as
β = 1 or β = 3 attribute a sign to the deviation from ¯ω
i
, so that a positive measure
denotes high-frequency spectral bias, and a negative measure speciﬁes a low-frequency
spectral bias.
Each of these measures provides useful information regarding the underlying speech
signal, and are illustrated when applied to a speech recording from the TIMIT database
[2], in Figure 6.8 (the deviation plot is given for
β = 2).
Shift indicates predominant LSP frequency distribution movements between consec-
utive frames. Considering an example of two adjacent frames containing unvoiced and
voiced speech, the LSP distributions in the two frames will be low-frequency, and high-
frequency biased respectively. We saw a similar effect when comparing the spoken
C and R spectra in Section 6.1.4. A large difference between the two frames gives a

6.1. Analysis toolkit
147
large measure value. The shift measure as shown in Figure 6.8, peaks at obvious speech
waveform changes, and may thus be advantageous for speech segmentation.
Bias indicates frequency trends within the current frame – that is whether the spectrum
of the current frame is high-frequency or low-frequency biased. It is similar to the
deviation measure which determines how close the LSP distribution of the current frame
is to a predetermined comparison distribution. In Figure 6.8 this registers high values
for fricatives, indicating the predominance of their high-frequency components.
Where the speech is contaminated by noise of a particularly well-deﬁned shape, if the
comparison distribution,
¯ω, of Equation (6.7) is set to represent a spectrum of this shape,
then the Dev measure may be reasonably insensitive to noise when averaged. In other
words analysing the noise itself will produce a zero mean output.
We can also use LSP data to estimate the position of spectral peaks within an analysis
frame. Peaks are located approximately halfway between pairs of closely spaced lines,
with the peak power related to the closeness of the lines. A ranking of the ﬁrst few
most closely-spaced line pairs in terms of narrowness, will generally correspond to the
ordering of the corresponding peaks by power. It must be noted, however, that in some
cases, especially where three lines are close together, the correspondence is far less
predicable. For unvoiced frames, other speech resonances are similarly reﬂected in the
LSP distribution, although the visual correspondence when plotted is far less dramatic
than in the case of strongly voiced speech.

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 116 117 118 119 120 121 122 123 ... 170