Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Advanced topics
Despite the profusion of applications, there are relatively few fundamental psychoacous- tic models, or modelling techniques used. Most of the applications employ a subset of a model predicting masking effect, an equal-loudness pre-emphasis, and an appreciation of frequency discrimination. Unfortunately, masking effect models do not often cater for temporal masking, which we presented as an example above (Figure 7.1). By contrast, simultaneous frequency masking is well catered for in masking models. This type of masking was described and illustrated in Section 4.2.8, particularly in Figure 4.3. Simultaneous frequency is well catered for because it is relatively easily modelled by computer. Models relate to single tones, of given frequency and power, which cause nearby tones of lower power to be inaudible. In Figure 4.3, the shaded area showed the extent of the masking effect for the signal of given frequency response: in essence it is a modified threshold of audibility similar to that of the equal-loudness contours of Section 4.2.1. The difference in audibility caused by the presence of the tone is the masking effect. Computational models exist for this masking effect due to tones. Of note is that much of the historical auditory data used to derive the computerised models were performed under well-defined and controlled conditions, with artificial signals such as white noise, and sinewaves used. While it is beyond reasonable doubt that the models describe those scenarios very well, it has not been established with the same confidence that the models accurately describe complex real sounds. In fact there is even some doubt that they can be applied to compound sounds [3]. Despite the doubts, these models are used in practice, and assume that complex sounds can be broken down into a set of tones, each of which result in a masking effect such as shown in Figure 4.3, with the overall masking effect from the sound being the summation of the separate contributions. When calculating the overall effect, it is possible to introduce nonlinear ‘corrections’ to the model to compensate for the fact that the effect is not a straightforward summation. Perhaps more useful is to use a critical band model of the ear. The loudest tone in each critical band is audible, and the masking effect is the weighted sum of all sound and noise components within that band. Whilst this does not account for several auditory factors, it does model the overall situation quite well – especially for situations with a clear distinction between wanted signal and interfering noise. In the remainder of this section we will use this approach to develop a simple but usable psychoacoustic model that has been applied, and tested, in several applications. The model involves several stages of processing. The assumption is that a particular audio signal is modelled to determine a threshold of masking (audibility) due to those sounds. Since there is very little likelihood that the absolute signal levels recorded on computer are the same as those that would impinge on the ear of a listener, then it must be stressed that – even assuming no variability among different people – the conclusions drawn by this model should be treated as being approximations. This model is provided to enable the reader to rapidly enter the field of psychoacoustics: to adjust, extend and improve on the system described. Many competing – and probably better – models exist, |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling