7.9. Voice and pitch changer
193
0
500
1000
1500
2000
2500
3000
3500
4000
0
10
20
30
40
50
1
3
5
7
9
2
4
6
8
10
Frequency, Hz
Relative amplitude, dB
LSP:
Figure 7.9
An original spectrum (dashed line) transformed to widen the bandwidth by spreading
spectral peaks (solid line) through the use of LSP adjustment.
7.9
Voice and pitch changer
If we record some speech at one sample rate and play it back at another, we may notice
changes in the perceived frequency of the replayed speech. For example, recording a
sentence at 8 kHz and replaying at 12 kHz in Matlab would cause the output to be
obviously different, speeded up in some way. In fact the output would be both higher in
frequency, and spoken quicker. However the result probably does not sound like human
speech – it has obviously been processed in some way.
The reason is related to the fact that speech is generated by the combination of several
physiological processes as discussed in Chapter 3, not all of which are linearly scaleable.
Put another way, although the voice of a child may have a pitch rate twice that of a man,
the syllabic rate is unlikely to be twice as fast. Furthermore, the frequency location of
child formants are unlikely to be twice the frequency of those of a man.
So changing the frequency of a human voice is a non-trivial operation. We cannot
simply double everything and expect the output will be convincingly human. Definitely
pitch must be scaled to adjust voice ‘frequency’, but this needs to be adjusted differently
from the other components of speech.
Practical methods of vocal frequency translation exist in both the time domain and in
the linear-prediction domain. In either case, the important aspect of the technology is to
stretch the voice of the speaker in some way. Having discussed the importance of pitch
in Section 5.3, and the role of the particular shape of the pitch pulse, such stretching or
194
Do'stlaringiz bilan baham: |