Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Speech communications
5.2. Parameterisation
105 The set of P linear equations can thus be represented as the following matrix: R (0) R (1) R (2) . . . R(P − 1) R (1) R (0) R (1) . . . R(P − 2) R (2) R (1) R (0) . . . R(P − 3) .. . .. . .. . . .. .. . R (P − 1) R(P − 2) R(P − 3) . . . R(0) a 1 a 2 a 3 .. . a P = R (1) R (2) R (3) .. . R (P) . (5.14) In practice a window, usually Hamming (Section 2.4.2), is applied to the input speech prior to calculating the autocorrelation functions, and the entire autocorrelation results are usually normalised by dividing by R(0) first. These normalised coefficients are denoted r (i). Standard techniques exist for the matrix solution including brute force matrix inver- sion, the famous Durbin–Levinson–Itakura method, or the Le Roux method which is slightly less efficient but is a compact and easily followed recursive formula [3]: k n +1 = e n n +1 e n 0 for n = 0, . . . , P (5.15) e n +1 0 = e n 0 − k n +1 e n n +1 = e n 0 (1 − k 2 n +1 ) (5.16) e n +1 i = e n i − k n +1 e n n +1−i for i = n, . . . , P (5.17) where the initial conditions for the recursion are set to e 0 i = R(i) for each i in the set of P equations. The values of k that are obtained from the Le Roux method are the reflection coeffi- cients. 5.2.3 Converting between reflection coefficients and LPCs Conversion from the reflection coefficients (which in some cases are quantised and transmitted from speech encoder to decoder) to the LPC parameters which are required for the LPC synthesis filter to recreate encoded speech, is not difficult. The relationship is shown in Equation (5.18) for all P coefficients, where the notation a i j indicates the jth LPC coefficient at time instant i: a i j = a (i−1) j + k i a (i−1) (i−j) with 1 ≤ j ≤ i − 1. (5.18) In order to perform the reverse conversion from LPC parameters into reflection coefficients, we start with the initial value: k i = a i j 106 Speech communications and then follow with: a (i−1) j = a i j − a i i a i (i−j) 1 − k 2 i with 1 ≤ j ≤ i − 1 (5.19) where Equation (5.19) is repeated with i decreasing from P to 1 with initial conditions of a P j = a j for all j’s between 1 and P. 5.2.4 Line spectral pairs Line spectral pairs (LSPs) are a direct mathematical transformation of the set of LPC parameters, and are generated within many speech compression systems, such as the more modern CELP coders (which will be discussed later in Section 5.4.1). LSP usage is popular due to their excellent quantisation characteristics and consequent efficiency of representation. They are also commonly referred to as Line Spectral Frequencies (LSF) [3]. LSPs collectively describe the two resonance conditions arising from an intercon- nected tube model of the human vocal tract. This includes mouth shape and nasal cav- ity, and forms the basis of the underlying physical relevance of the linear prediction representation. The two resonance conditions describe the modelled vocal tract as being either fully open or fully closed at the glottis respectively (compare this to the model of the reflection coefficients in Section 5.2.2). The model in question is constructed from a set of equal length but different diameter tubes, so the two conditions mean the source end is either closed or open respectively. The two conditions give rise to two sets of resonant frequencies, with the number of resonances in each set being determined by the number of joined tubes (which is defined by the order of the analysis system). The resonances of each condition give rise to odd and even line spectral frequencies respectively, and are interleaved into a set of LSPs which have monotonically increasing value. In reality, however, the human glottis opens and closes rapidly during speech – it is neither fully closed nor fully open. Hence actual resonances occur at frequencies located somewhere between the two extremes of each LSP condition. Nevertheless, this relationship between resonance and LSP position lends a significant physical basis to the representation. Figure 5.10 illustrates LSPs overlaid on an LPC spectral plot (made using the lpcsp() Matlab function given later in Section 5.2.4.3). The 10 vertical lines were drawn at the LSP frequencies, and show the odd and even frequencies being interleaved. Both the LSPs and the spectrum were derived from the same set of tenth- order linear prediction parameters which were obtained from a linear predictive analysis of a 20 ms voiced speech frame. Notable features of Figure 5.10 include the natural interleaving of the odd and even LSP frequencies, and the fact that spectral peaks tend to be bracketed by a narrow pair of lines (explained by the comment previously indicating that the actual resonance |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling