Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Speech communications
the correlation by shifting the numerator e (n) only, rather than both the numerator and the denominator. In fact this method is used in most real-time speech coders. It is left as an exercise for the reader to modify the Matlab function in that way and see whether the same results can be found. 5.3.3 Pitch issues The pitch extraction method of Section 5.3.2.1, in common with many other methods, often produces an answer equal to half, or twice of the actual pitch period. This is called pitch halving and pitch doubling and is the scourge of many engineers working with pitch detection algorithms. To some extent, setting hard limits on pitch period can provide an answer (i.e. saying, for example, that pitch cannot be less than 50 Hz or more than 300 Hz), but such a range still has ample scope for doubling or halving. Just think of the comparison between a squeaky five year old child’s voice and that of a deep bass voice such as those belonging to Paul Robeson or Luciano Pavarotti. Many algorithms do not impose absolute limits, but disallow sudden shifts in pitch as being unlikely in a real scenario. True of speech, but such decisions tend to result in speech systems unable to handle music, singing or DTMF (Dual Tone Multiple Frequency) and facsimile signalling. G.728 for example, limits the pitch slew rate between subframes to ±6 samples unless the relative strength of the new pitch component is at least 2.5 times greater than that of the previous frame [18]. Any limits used are likely to require empirical testing with a range of subject matter. 5.4 Analysis-by-synthesis The idea behind analysis-by-synthesis at the encoder is to analyse a frame (or more) of speech, and extract parameters from this. These parameters are then used to create a frame of reconstructed speech. The frames of original and reconstructed speech are then compared to see how closely they match. Some part of the parameter extraction process is then varied to create a slightly different set of parameters, which are in turn compared to the original speech. Perhaps several hundred iterations are made across a search space, and the best set of parameters (based on how close the match is between original and reconstructed speech) are then transmitted to the receiver. Something to consider is that the parameters may need to be quantised before being transmitted to the decoder. In this case the quantised- dequantised parameters are the ones used by the encoder to check how good is the matching. Before we look at the most famous of the analysis-by-synthesis coding structures, it is important to remember that ‘degree of matching’, calculated as a difference between vectors, may not relate at all to how a human perceives degree of difference. As a very trivial example, imagine a continuous sinewave original signal. Next imagine a version which is delayed by a few degrees. The degree of matching in a mean-squared sense will |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling