Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Basic audio processing
• Signal Processing First J. McClellan, R. W. Schafer, M. A. Yoder (Pearson Education, 2003) A text for beginners, this book starts with introductory descriptions related to sound, and follows through to show how this can be represented digitally, or by computer. The coverage of basic Fourier analysis, sampling theory, digital filtering and discrete-time systems is gentle yet extensive. It is also possible to obtain a set of Matlab examples related to the material in this book. Audio • A Digital Signal Processing Primer: With Applications to Digital Audio and Computer Music K. Steiglitz (Prentice-Hall, 1996) There are few books that cover introductory audio and speech systems alone. This too covers audio as an application of digital signal processing rather than as a subject in its own right. The book thus spends time considering the DSP nature of handling audio signals, which is no bad thing; however the orientation is good, being practical and relatively focused on the applications. • Speech and Audio Signal Processing: Processing and Perception of Speech and Music B. Gold and N. Morgan (Wiley, 1999) The epitome of speech and audio textbooks, this 560-page tome is divided into 36 chapters that cover literally every aspect of the processing and perception of speech and music. For readers wishing to purchase a single reference text, this would probably be first choice. It is not a book for absolute beginners, and is not orientated at providing practical methods and details, but for those already comfortable with the main techniques of computer processing of speech and audio, it would be useful in expanding their knowledge. References [1] S. W. Smith. Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes, 2000. URL www.dspguide.com. [2] J. W. Gibbs. Fourier series. Nature, 59: 606, 1899. [3] R. W. Schaefer and L. R. Rabiner. Digital representation of speech signals. Proc. IEEE, 63(4): 662–677, 1975. [4] B. P. Bogert, M. J. R. Healy, and J. W Tukey. The quefrency analysis of time series for echoes: Cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In M. Rosenblatt, editor, Proceedings of the Symposium on Time-Series Analysis, pages 209–243. John Wiley, 1963. [5] D. G. Childers, D. P. Skinner, and R. C. Kemerait. The cepstrum: A guide to processing. Proc. IEEE, 65(10): 1428–1443, 1977. [6] F. Zheng, G. Zhang, and Z. Song. Comparison of different implementations of MFCC. J. Computer Sci. and Technol., 16(6): 582–589, 2001. 37 3 Speech Chapter 2 described the general handling, processing and visualisation of audio vec- tors: sequences of samples captured at some particular sample rate, and which together represent sound. This chapter will build upon that foundation, and use it to begin to look at speech. There is nothing special about speech from an audio perspective – it is simply like any other sound – it’s only when we hear it that our brains begin to interpret a particular signal as being speech. There is a famous experiment which demonstrates a sentence of sinewave speech. This presents a particular sound recording made from sinewaves. Initially, the brain of a listener does not consider this to be speech, and so the signal is unintelligible. However after the corresponding sentence is heard spoken aloud in a normal way, the listener’s brain suddenly ‘realises’ that the signal is in fact speech, and from then on it becomes intelligible. After that the listener cannot ‘unlearn’ this fact: similar sentences which are generally completely unintelligible to others will be perfectly intelligible to this listener [1]. Apart from this interpretative behaviour of the human brain, there are audio character- istics within music and other sounds that are inherently speech-like in their spectral and temporal characteristics. However speech itself is a structured set of continuous sounds, by virtue of its production mechanism. Its characteristics are very well researched, and many specialised analysis, handling and processing methods have been developed over the years especially for this narrow class of audio signals. Initially turning our back on the computer and speech processing, this chapter will consider the human speech production apparatus, mechanisms, and characteristics. This will be followed by an examination of the physical properties of speech itself resulting from the mechanism of its generation. We will then begin our study of how these prop- erties allow for speech-related processing efforts. Download 2.66 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling