Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 3 Empirical Study
(8)
To implement MAP for the d th projection matrix d U estimation, the EM algorithm is applied here. The expectation of the log likelihood of complete data with respect to ( )
; ; | , , , d j d j d d d p x t U μ σ is given by ( )
( ) ( ) ( ) ( ) ; ; 1 1 1 1 log , | = log , . d nl M n M c i i j d d j d j d i d j E L E p U E p t x ≠ = = = = ⎡ ⎤ ⎡ ⎤ = ⎣ ⎦ ⎣ ⎦ ∑∑ ∑∑
(9)
Here, ( ) ( ) ; ; log
, d j d j p t x with the given 1 |
k k M U ≠ ≤ ≤ is given by ( ) ( ) ( ) 2 2 2 ; ; ; ; ; ; 2 1 log , log
. T d j d j d j d j d d j d i p t x x d t U x u σ σ ∝ − − − − −
(10) It is impossible to maximize ( ) c E L with respect to all projection matrices 1 |
d d U =
because different projection matrices are inter-related [7] during optimization procedure, i.e., it is required to know j d U ≠ to optimize d U . Therefore, we need to apply alternating optimization procedure [7] for optimization. To optimize the d th
d U with 2
σ , we need the decoupled expectation of the log likelihood function on the d th mode: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ; ; 1 2 ; ; ; ; ; ; 2 1 ; ; ; ; ; 2 2 log , 1 tr log
. 2 1 tr d d nl d j d j j T T nl d j d j d j d i d j d i j T T d j d d j d i d d d j d j E p t x E x x d t u t u E x U t u U U E x x σ σ σ σ = = ⎡ ⎤ ⎣ ⎦ ⎛ ⎞ ⎡ ⎤ + + − − ⎜ ⎟ ⎣ ⎦ ⎜ ⎟ ∝ −
⎜ ⎟ ⎡ ⎤ ⎡ ⎤ − − + ⎜ ⎟ ⎣ ⎦ ⎣ ⎦ ⎝ ⎠ ∑ ∑ (11)
Based on (6), then we have ( ) 1 ; ; d j d d d j d E x M U t μ − ⎡ ⎤ = − ⎣ ⎦ (12)
and 2 1 ; ; ; ; .
T d j d j d d d j d j E x x M E x E x σ − ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = + ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ (13) Eq. (12) and (13) form the expectation step or E-step. The maximization step or M-step is obtained by maximizing ( ) ( ) ; ; 1 log , d nl d j d j j E p t x = ⎡ ⎤ ⎣ ⎦ ∑
with respect to d U and
2 d σ . In detail, by setting ( ) ( ) ; ; 1 log
, 0
d nl U d j d j j E p t x = ⎡ ⎤ ⎡ ⎤ ∂ = ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ∑ , we have ( ) 1 ; ; ; ; 1 1 d nl nl T d d j d j d j d j d j j U E x x E x t μ − = = ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = − ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ∑ ∑ ; (14)
Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria 797 and by setting ( ) ( ) 2 ; ; 1 log , 0
d nl d j d j j E p t x σ = ⎡ ⎤ ⎡ ⎤ ∂ = ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ∑ , we have ( )
) 2 ; ; ; 2 1 ; ; 2 1 tr d T nl d j d d j d d j d d T T i d d d j d j d d t E x U t nl l E x x U U μ μ σ = ⎧ ⎫ ⎡ ⎤ − − − ⎪ ⎪ ⎣ ⎦ = ⎨ ⎬ ⎡ ⎤ + ⎪ ⎪ ⎣ ⎦ ⎩ ⎭ ∑
(15)
After having projection matrices 1 |
d d U = , the following operations are important for different applications: Dimension Reduction: Given the projection matrices 1 |
d d U = and an observed tensor 1 2 1 M M l l l l R − × × × × ∈
in the high dimensional space, how to find the corresponding latent tensor 1 2
M M l l l l R − ′ ′ ′ ′ × × × × ∈
in the low dimensional space? From tensor algebra, the dimension reduction is given by 1
× = = ∏
T . However, the method is absent the probabilistic perspective. Under the proposed decoupled probabilistic model,
is
obtained by maximizing ( ) ( ) 1 | |
d d d p p x t = ∝ ∏ X T . The dimension reduction is ( )
) 1 1 d M T T d d d M U − × = = − ∏ X T M .
(16) Data Reconstruction: Given the projection matrices 1 |
d d U = and the latent tensor 1 2 1 M M l l l l R − ′ ′ ′ ′ × × × × ∈
in the low dimensional space, how to approximate the corresponding observed tensor 1 2 1 M M l l l l R − × × × × ∈
in the high dimensional space? Based on (16), the data reconstruction procedure is given by ( )
) 1 1 ˆ .
M T T T d d d d d U U U M − × = = + ∏ T X M (17)
The reconstruction error is given by ˆ
Fro −
T .
AIC and BIC are popular methods for model selection in statistics. However, they are developed for vector data. In the proposed PTA, data are in tensor form. Therefore, it is important to find a suitable method to utilize AIC and BIC for tensor based learning models.
In PTA, the conventional AIC and BIC could be applied to determine the size of 1 | M d d U = . The exhaustive search based on AIC (BIC) is applied for model selection. In detail, for AIC based model selection, we need to calculate the score of AIC 798 D. Tao et al. ( )
) ( ) ( ) (
) ( ) 2 1 2 2 , , 2 2 1 log det
tr AIC d d d d d d d d T T d d d d d d d d J U l l l l l nl U U I U U I S σ σ σ − ′ ′ ′ ′
= + −
− ⎡ ⎤ + + + + ⎢ ⎥ ⎣ ⎦
(18) for each mode ( ) 1 1 M d d l = − ∏ times, because the number of rows d l ′ in each projection matrix d U changes from 1 to ( ) 1 d l − . In determination stage, the optimal * d l ′ is ( ) 2 * arg min
, ,
AIC d d d d d l l J U l σ ′ ′ ′ = , (19)
where 1 1
d l l ′ ≤ ≤ − . For BIC based model selection in PTA, we have similar definition as AIC, ( )
( ) ( ) ( ) ( ) 2 1 2 2 1 , , log 1 2
log det tr
d BIC d d d d d d d T T d d d d d d d d l l J U l nl l l nl U U I U U I S σ σ σ − ′ ′ − ⎛ ⎞ ′ ′ = + − ⎜ ⎟ ⎝ ⎠ ⎡ ⎤ + + + + ⎢ ⎥ ⎣ ⎦
(20) for each mode ( ) 1 1 M d d l = − ∏ times. In determination stage, the optimal *
′ is ( ) 2 * arg min
, ,
BIC d d d d d l l J U l σ ′ ′ ′ = , (21)
where 1 1
d l l ′ ≤ ≤ − . 3 Empirical Study In this Section, we utilize a synthetic data model, to evaluate BIC PTA in terms of accuracy for model selection. For AIC PTA, we have the very similar experimental results as BIC PTA. The accuracy is measured by the model selection error 1 *
d d d l l = ′ ′ − ∑ . Here, d l ′ is the real model, i.e., the real dimension of d th mode of the unobserved latent tensor; and *
l ′ is the selected model, i.e., the selected dimension of the d th mode of the unobserved latent tensor by using BIC PTA. A multilinear transformation is applied to map the tensor from the low dimensional space 1 2 M l l l R ′ ′
′ × × ×
to high dimensional space 1 2
l l l R × × ×
by 1
M T i i d i d U ς × = = + + ∏
X M E , where
1 2
l l l i R ′ ′
′ × × ×
∈ X
and every entry of every unobserved latent tensor i X is generated from a single Gaussian with mean zero and variance 1, i.e., ( )
0,1 N ;
E is the noise tensor and every entry
( )
0,1 N , ς is a scalar and we set it as 0.01, the mean tensor 1 2 M l l l R × × ×
∈ M is a random tensor and every entry in M is drawn from the uniform distribution on the interval [ ]
0,1 ; projection matrices 1 | d d l l M d d U R ′ ×
= ∈ are random matrices and every entry in 1 | M d d U = is drawn from the uniform distribution on the interval [ ]
0,1 ; and i denotes the i th tensor measurement. Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria 799
corresponds to a BIC score. The darker the block is the smaller the BIC score is. Based on this Figure, we determine 1 *
l ′ =
and 2 * 2 l ′ =
based on BIC obtained by PTA and the model selection error is 0. Figure 4 shows the Hinton diagram of the first and the second projection matrices in the left and the right sub-figures, respectively. Projection matrices are obtained from PTA by setting 1 *
l ′ =
and 2 * 5 l ′ =
. In the first experiment, the data generator gives 10 measurements by setting 2
= , 1 8
= ,
3 l ′ =
, 2 6 l = , and 2 2
′ = . To determine 1 *
′ and
2 *
′ based on BIC for PTA, we need to conduct PTA ( )( ) 1 2 1 1
l − − times and obtain two BIC score matrices for the first mode projection matrix 1
and the second projection matrix 2
, respectively, as shown in Figure 3. In this Figure, every block corresponds to a BIC score and the darker the block is the smaller the corresponding BIC score is. We use a light rectangular to hint the darkest block in each BIC score matrix and the block corresponds to the smallest value. In the first BIC score matrix, as shown in the left sub-figure of Figure 3, the smallest value locates at ( ) 3, 5 . Because this BIC score matrix is calculated for the first mode projection matrix based on (20), we can set 1 * 3 l ′ =
according to (21). Similar to the determination of 1 * l ′ , we determine 2 * 2 l ′ =
according to the second BIC score matrix, as shown in the right of Figure 3, because the smallest value locates at ( )
7, 2 . For this example, the model selection error is 2 1 * 0
d d l l = ′ ′ − = ∑ . We repeat the experiments with the similar setting as the first experiment in this Section 30 times, but 1
, 1
′ , 2 l , and
2 l ′ are randomly set with the following requirements: 1 2 6 , 10 l l ≤ ≤ , 1 2 2 , 5 l l ′ ′
≤ ≤ , 1 1
l ′ <
, and 2 2 l l ′ <
. The total model selection errors for BIC PTA are 0. We also conduct 30 experiments for third order tensor, with similar setting as described above and 1
, 1
′ , 2 l , 2 l ′ , 3 l , and
3 l ′ are setting |
ma'muriyatiga murojaat qiling