Applied Speech and Audio Processing: With matlab examples

bet	117/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 113 114 115 116 117 118 119 120 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Audio analysis

6.1. Analysis toolkit
141
Figure 6.5
Plot of two 16 kHz sampled speech utterances of the letters C (left) and R (right),
with time-domain waveform plots at the top, and frequency-domain spectra plotted below them.
Nc=length(speech_letter_c);
Nr=length(speech_letter_r);
fft_c=fft(speech_letter_c);
fft_c=abs(fft_c(1:Nc/2));
fft_r=fft(speech_letter_r);
fft_r=abs(fft_r(1:Nr/2));
At this point, we could plot the spectra if required. Remember, we are plotting only the
positive frequency components: the FFT output would be as long as the original speech
array, so we take the lower half of this only for simplicity, and we take the absolute value
because the FFT output is complex. When trying this example, Matlab may complain
that you need to ensure that only integers are used for the ‘colon operator’ indexing.
This would happen if the length of the original speech arrays was not even (in fact,
considering the further subdivision we are about to perform, we should really ensure
that the speech array sizes were a multiple of four).
Next we can simply sum up the frequency elements within the required ranges (in this
case, the lower half frequencies and the upper half frequencies respectively):
c_lowf=sum(fft_c(1:Nc/4))/(Nc/4);
c_highf=sum(fft_c(1+Nc/4:Nc/2))/(Nc/4);
r_lowf=sum(fft_r(1:Nr/4))/(Nr/4);
r_highf=sum(fft_r(1+Nr/4:Nr/2))/(Nr/4);
For the example spectra plotted in Figure 6.5, the results are telling. Splitting the spectrum
in half, the mean absolute lower half frequency components for R are 0.74, and for
C are 0.87. For the mean absolute higher half frequency components, R scores 0.13

142
Audio analysis
while C scores 2.84. However it is the ratios of these that are particularly meaningful.
The letter C has a high-frequency to low-frequency ratio of 3.3, but the letter R scores
only 0.18. These ﬁgures indicate that much of the energy in the spoken letter C is higher
frequency, whilst much of the energy in the spoken letter R is lower frequency. Indeed
we can visualise this by looking at the spectral plots, but we have just created a measure
that can be performed automatically by a computer:
c_ratio=c_highf/c_lowf;
r_ratio=r_highf/r_lowf;
Although this is a relatively trivial example, it is possible, for one speaker, to identify
spoken letters by segmenting them (isolating an analysis frame that contains a sin-
gle letter, and even subdividing this), performing an FFT, then examining the ratio of
the summed frequency components across different regions of the frequency spectrum.
Unfortunately this technique cannot normally be generalised to work for the speech of
many different people.
6.1.5
Cepstral analysis
The cepstrum was introduced in Section 2.6.2.2 where an example of the technique
was presented as a useful method of visualising speech signals. As with many other
visualisation methods, the useful information that the human eye can notice in a plot can
also be extracted and analysed by computer.
The usefulness of the cepstrum derives from the fact that it is the inverse FFT of the
logarithm of the FFT. In general terms, this means that the frequency components have
been ordered logarithmically. In mathematics, one of the principles of the logarithm is
that if something is the multiplicative combination of two items, then in the logarithmic
domain, these items are combined additively. Put another way, if a signal under analysis,
y
(t) can be said to be equal to h(t) multiplied by x(t), then:
y
(t) = h(t) × x(t)
(6.4)
log
[y(t)] = log[h(t)] + log[x(t)].
Relating back to speech signals, x
(t) may well be a pitch component, while h(t) is a
vocal tract component. In the time domain these are related multiplicatively, but in the
cepstrum domain, they are related additively. In a cepstral plot then, the pitch component,
for instance, would be visible in its own right, separated from the vocal tract component.
This has been illustrated in Figure 6.6, plotted using the method of Section 2.6.2.2.
The most likely position of the fundamental pitch period component, at index position
64, has been selected.

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 113 114 115 116 117 118 119 120 ... 170