Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Audio analysis
6.1. Analysis toolkit
141 Figure 6.5 Plot of two 16 kHz sampled speech utterances of the letters C (left) and R (right), with time-domain waveform plots at the top, and frequency-domain spectra plotted below them. Nc=length(speech_letter_c); Nr=length(speech_letter_r); fft_c=fft(speech_letter_c); fft_c=abs(fft_c(1:Nc/2)); fft_r=fft(speech_letter_r); fft_r=abs(fft_r(1:Nr/2)); At this point, we could plot the spectra if required. Remember, we are plotting only the positive frequency components: the FFT output would be as long as the original speech array, so we take the lower half of this only for simplicity, and we take the absolute value because the FFT output is complex. When trying this example, Matlab may complain that you need to ensure that only integers are used for the ‘colon operator’ indexing. This would happen if the length of the original speech arrays was not even (in fact, considering the further subdivision we are about to perform, we should really ensure that the speech array sizes were a multiple of four). Next we can simply sum up the frequency elements within the required ranges (in this case, the lower half frequencies and the upper half frequencies respectively): c_lowf=sum(fft_c(1:Nc/4))/(Nc/4); c_highf=sum(fft_c(1+Nc/4:Nc/2))/(Nc/4); r_lowf=sum(fft_r(1:Nr/4))/(Nr/4); r_highf=sum(fft_r(1+Nr/4:Nr/2))/(Nr/4); For the example spectra plotted in Figure 6.5, the results are telling. Splitting the spectrum in half, the mean absolute lower half frequency components for R are 0.74, and for C are 0.87. For the mean absolute higher half frequency components, R scores 0.13 142 Audio analysis while C scores 2.84. However it is the ratios of these that are particularly meaningful. The letter C has a high-frequency to low-frequency ratio of 3.3, but the letter R scores only 0.18. These figures indicate that much of the energy in the spoken letter C is higher frequency, whilst much of the energy in the spoken letter R is lower frequency. Indeed we can visualise this by looking at the spectral plots, but we have just created a measure that can be performed automatically by a computer: c_ratio=c_highf/c_lowf; r_ratio=r_highf/r_lowf; Although this is a relatively trivial example, it is possible, for one speaker, to identify spoken letters by segmenting them (isolating an analysis frame that contains a sin- gle letter, and even subdividing this), performing an FFT, then examining the ratio of the summed frequency components across different regions of the frequency spectrum. Unfortunately this technique cannot normally be generalised to work for the speech of many different people. 6.1.5 Cepstral analysis The cepstrum was introduced in Section 2.6.2.2 where an example of the technique was presented as a useful method of visualising speech signals. As with many other visualisation methods, the useful information that the human eye can notice in a plot can also be extracted and analysed by computer. The usefulness of the cepstrum derives from the fact that it is the inverse FFT of the logarithm of the FFT. In general terms, this means that the frequency components have been ordered logarithmically. In mathematics, one of the principles of the logarithm is that if something is the multiplicative combination of two items, then in the logarithmic domain, these items are combined additively. Put another way, if a signal under analysis, y (t) can be said to be equal to h(t) multiplied by x(t), then: y (t) = h(t) × x(t) (6.4) log [y(t)] = log[h(t)] + log[x(t)]. Relating back to speech signals, x (t) may well be a pitch component, while h(t) is a vocal tract component. In the time domain these are related multiplicatively, but in the cepstrum domain, they are related additively. In a cepstral plot then, the pitch component, for instance, would be visible in its own right, separated from the vocal tract component. This has been illustrated in Figure 6.6, plotted using the method of Section 2.6.2.2. The most likely position of the fundamental pitch period component, at index position 64, has been selected. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling