Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 2.6.2.2 Cepstrum
2.6. Visualisation
29 The argument to the axis commands to define the y-dimension (from −0.05 to 0.05 and from −0.2 to 0.2, respectively) will need to be changed depending upon the amplitude and composition of the speech segment that you are working with, but if chosen correctly will ensure that the plots fill the entire plot area. This resulting plot is reproduced in Figure 2.9. It should be noted from the figure that the first major peak identified in the correlogram, at an x distance of 0.006 875, which is 6.875 ms, corresponds to the main period in the speech plot above. Try measuring the distance between speech peaks with a ruler, and then comparing this to the distance from the y-axis to the first identified peak in the correlogram. This illustrates the main use of the technique for audio analysis – detecting periodicities. 2.6.2.2 Cepstrum The name ‘cepstrum’ comes about by reversing the first half of the word ‘spectrum’, and plots the amplitude of a signal against its ‘quefrency’ – actually the inverse frequency. Evidently neither word was chosen for ease of pronunciation. However, the technique is particularly good at separating the components of complex signals made up of several simultaneous but different elements combined together – such as speech (as we will see in Chapter 3). The cepstrum is generated as the Fourier transform of the log of the Fourier transform of the original signal [4]. Yes, there really are two Fourier transform steps, although in practice the second one is often performed as an inverse Fourier transform instead [5]. 2 Using Matlab again, a very simple example would be to plot the cepstrum of the speech analysed above with the correlogram. This is fairly simple to plot – not quite accurately as per the original meaning of cepstrum, but certainly useful enough: ps=log(abs(fft(hamming(length(segment)).*segment))); plot(abs(ifft( ps ))); Most likely, if the speech segment were as large as that used for the correlogram example, the resulting cepstrum would have a huge DC peak and much of the detail in the plot obscured. It is possible to zoom in on sections of the plot either using the Matlab viewing tools, or by reducing the size of the original speech segment being analysed. Within a cepstral plot, there should be a peak visible at the same index position as the peak in the correlogram. For example, a peak in the 256-point cepstrum of 8 kHz speech at x-index 56 would relate to a frequency of 4000 ∗ 56/256 = 875 Hz. This method of analysis will be illustrated later in Section 6.1.5. It seems that both the correlogram and the cepstrum can reveal a fundamental fre- quency. Both methods, while accomplishing similar tasks, have unique strengths: peak 2 Technically the method described here is the discrete time power cepstrum, arguably the most useful of the cepstral techniques. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling