Audio analysis
Figure 6.4
Average magnitude difference function (AMDF) and frame power plots (lower
graph) for 16 kHz sampled speech containing a recitation of the alphabet from letter A to letter G
(plotted on the upper graph). The duration of the letter C is enough for there to be almost two
amplitude ‘bumps’ for the /c/ and the /ee/ sounds separated by a short gap, spanning the time
period from approximately 0.7 to 1.2 seconds. Note that the correspondence of the two analysis
measures plotted on the lower graph is close during periods of speech, but less so in the gaps
between words.
6.1.4
Spectral measures
In Chapter 2 we had looked at how to use an FFT to determine a frequency spectrum.
In Matlab we had created and then plotted the spectrum of a random vector (see
Section 2.3). If, instead of plotting the spectrum directly, we were to analyse the spectral
components, we could use these to build up a spectral measure.
To illustrate this, we will plot the spectra of two different regions of a speech record-
ing, examine these and analyse further. Both the time and frequency domain plots are
shown in Figure 6.5, for the spoken letters C and R.
From the time-domain waveform plots in the figure, it can be seen that the C is prob-
ably louder than the R (it has higher amplitude), and also has a slightly longer duration.
The frequency-domain plots show some spikes at low frequencies – most likely formant
frequencies – but also show that the C has more high-frequency components than the R.
Most of the signal power in the R seems to be below about 1 kHz, but much of the signal
power in the C seems to be above this.
In Matlab we can devise a measure to compare the low-frequency spectral power
to the high-frequency spectral power. To do this, first we need to derive the spectra (as
plotted in the lower half of Figure 6.5) and name them fft_c and fft_r to denote the
spectra of spoken letter C and spoken letter R, respectively:
Do'stlaringiz bilan baham: |