5.3.2.1
Pitch extraction
There is actually a very wide variety of pitch extraction methods in published literature
(some are given later in Section 6.2.1), although we will here describe one of the simpler,
and more common methods used in speech coders. This method relies upon minimising
the mean-squared error between an LPC residual (containing pitch), and the reconstructed
pitch signal resulting from the analysis.
If E is the mean-squared error, e is the residual and e
is the reconstructed pitch signal
after analysis, then:
E
(M , β) =
N
−1
n
=0
{e(n) − e
(n)}
2
(5.35)
and assuming a single tap pitch filter as in Equation (5.33) then:
E
(M , β) =
N
−1
n
=0
{e(n) − βe(n − M )}
2
(5.36)
where N is the analysis window size (usually one or more subframes). In order to find the
set of
β and M that minimises the mean-squared error (i.e. best reproduces the original
pitch signal) then we need to differentiate the expression and set to zero:
δE
δβ
=
N
−1
n
=0
{2βe
2
(n − M ) − 2e(n)e(n − M )} = 0
(5.37)
so
β
optimum
=
N
−1
n
=0
e
(n)e(n − M )
N
−1
n
=0
e
2
(n − M )
.
(5.38)
We can now substitute the optimum
β from Equation (5.38) into (5.36) to give the
optimum M from:
E
optimum
(M ) =
N
−1
n
=0
e
2
(n) − E
optimum
(M )
(5.39)
as only the second part of the equation varies with respect to M it must be maximised
to minimise the error. Thus the following must be determined with respect to each
permissible value of M , and the value at which a maximum occurs, stored:
Do'stlaringiz bilan baham: |