Applied Speech and Audio Processing: With matlab examples

bet	102/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 98 99 100 101 102 103 104 105 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Speech communications
the correlation by shifting the numerator e
(n) only, rather than both the numerator and
the denominator. In fact this method is used in most real-time speech coders. It is left as
an exercise for the reader to modify the Matlab function in that way and see whether
the same results can be found.
5.3.3
Pitch issues
The pitch extraction method of Section 5.3.2.1, in common with many other methods,
often produces an answer equal to half, or twice of the actual pitch period. This is called
pitch halving and pitch doubling and is the scourge of many engineers working with
pitch detection algorithms.
To some extent, setting hard limits on pitch period can provide an answer (i.e. saying,
for example, that pitch cannot be less than 50 Hz or more than 300 Hz), but such a range
still has ample scope for doubling or halving. Just think of the comparison between a
squeaky ﬁve year old child’s voice and that of a deep bass voice such as those belonging
to Paul Robeson or Luciano Pavarotti. Many algorithms do not impose absolute limits,
but disallow sudden shifts in pitch as being unlikely in a real scenario. True of speech,
but such decisions tend to result in speech systems unable to handle music, singing or
DTMF (Dual Tone Multiple Frequency) and facsimile signalling.
G.728 for example, limits the pitch slew rate between subframes to
±6 samples unless
the relative strength of the new pitch component is at least 2.5 times greater than that
of the previous frame [18]. Any limits used are likely to require empirical testing with a
range of subject matter.
5.4
Analysis-by-synthesis
The idea behind analysis-by-synthesis at the encoder is to analyse a frame (or more)
of speech, and extract parameters from this. These parameters are then used to create a
frame of reconstructed speech. The frames of original and reconstructed speech are then
compared to see how closely they match. Some part of the parameter extraction process
is then varied to create a slightly different set of parameters, which are in turn compared
to the original speech.
Perhaps several hundred iterations are made across a search space, and the best set of
parameters (based on how close the match is between original and reconstructed speech)
are then transmitted to the receiver. Something to consider is that the parameters may
need to be quantised before being transmitted to the decoder. In this case the quantised-
dequantised parameters are the ones used by the encoder to check how good is the
matching.
Before we look at the most famous of the analysis-by-synthesis coding structures, it
is important to remember that ‘degree of matching’, calculated as a difference between
vectors, may not relate at all to how a human perceives degree of difference. As a very
trivial example, imagine a continuous sinewave original signal. Next imagine a version
which is delayed by a few degrees. The degree of matching in a mean-squared sense will

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 98 99 100 101 102 103 104 105 ... 170