Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 2.3. Audio processing 17 Figure 2.3
Basic audio processing
Figure 2.2 Absolute FFT plot for audio spectrum, with frequency index along the x-axis and amplitude along the y-axis. Frequency domain operations, by contrast, usually require the audio to be first con- verted to the frequency domain by use of a Fourier transform or similar, such as the Fast Fourier Transform (FFT): a_spec=fft(a_vector); or more generally when the audio vector length is not a power of two, it is possible to zero-pad (or truncate) the audio vector to fill the size of FFT specified, as the following illustrates for a 256-element FFT: a_spec=fft(a_vector, 256); The pertinent question which arises is ‘how big is that transform?’ Which is also asking ‘what frequency resolution is required?’ The reader should already be aware that the number of frequency bins in the FFT output is based on the number of samples given as input. This will be explored further in Section 2.5.2, but for the present, suffice it to say that a convenient power-of-two size is normally chosen for frequency vector length. Another way of achieving this is: a_spec=fft(a_vector(1:256)); In Matlab, the resultant frequency-domain vector will be complex. Plotting the absolute value of the vector provides a double-sided frequency representation shown in Figure 2.2 and plotted using: plot(abs(a_spec)); 2.3. Audio processing 17 Figure 2.3 Single-sided absolute FFT plot for the same audio spectrum as shown in Figure 2.2. In this unusual plot, the frequency axis (if scaled correctly), would start at 0, progress to the Nyquist frequency at the centre point, and then decrease to 0 at the far right. Both positive and negative frequencies are shown – something which is not particularly useful. In fact Matlab differs in this way (for historical reasons) from many other FFT libraries in use for C and FORTRAN programmers. We can produce a more standard plot with the low frequencies in the centre of the plot using: plot(abs(fftshift(a_spec))); However in audio processing we tend to plot the single-sided spectrum – and give it more useful axes. Plotting the same spectrum with variables Fs=8000 and Ns=256 describing the original sample rate and size of the FFT respectively, then a better plot would be achieved with: plot( [1 : 2*Fs/Ns : Fs], abs(a_spec(1:Ns/2)), ’r’); This plots the spectrum as shown in Figure 2.3, which is clearly a more useful, and physically relevant representation, with the ‘r’argument to plot() meaning the plotted line is coloured red on a colour display. Of course when performing audio processing, some form of analysis would typically be performed on the frequency vector that results from an FFT. This is all well and good, but what if the audio vector contains many more than 256 samples? The answer is that the longer vector will be split (or segmented) into several 256-sample frames, and each frame handled separately. Segmentation is needed not only because 256 (say) is a convenient size, but when any of the following are true: 1. The audio is continuous (i.e. you can’t wait for a final sample to arrive before begin- ning processing). |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling