Lecture Notes in Computer Science

Probabilistic Tensor Analysis with Akaike and

bet	72/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 68 69 70 71 72 73 74 75 ... 88

Probabilistic Tensor Analysis with Akaike and

Bayesian Information Criteria

Dacheng Tao

, Jimeng Sun

, Xindong Wu

, Xuelong Li

, Jialie Shen

Stephen J. Maybank

, and Christos Faloutsos

Department of Computing, Hong Kong Polytechnic University, Hong Kong

csdct@comp.polyu.edu.hk

Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA

jimeng@cs.cmu.edu, christos+@cs.cmu.edu

Department of Computer Science, University of Vermont, Burlington, USA

xwu@cs.uvm.edu

Sch. Computer Science & Info Systems, Birkbeck, University of London, London, UK

xuelong@dcs.bbk.ac.uk, sjmaybank@dcs.bbk.ac.uk

School of Information Systems, Singapore Management University, Singapore

jlshen@smu.edu.sg

Abstract. From data mining to computer vision, from visual surveillance to

biometrics research, from biomedical imaging to bioinformatics, and from

multimedia retrieval to information management, a large amount of data are

naturally represented by multidimensional arrays, i.e., tensors. However,

conventional probabilistic graphical models with probabilistic inference only

model data in vector format, although they are very important in many

statistical problems, e.g., model selection. Is it possible to construct multilinear

probabilistic graphical models for tensor format data to conduct probabilistic

inference, e.g., model selection? This paper provides a positive answer based on

the proposed decoupled probabilistic model by developing the probabilistic

tensor analysis (PTA), which selects suitable model for tensor format data

modeling based on Akaike information criterion (AIC) and Bayesian

information criterion (BIC). Empirical studies demonstrate that PTA associated

with AIC and BIC selects correct number of models.

Keywords: Probabilistic Inference, Akaike Information Criterion, Bayesian

Information Criterion, Probabilistic Principal Component Analysis, and Tensor.

1 Introduction

In computer vision, data mining, and different applications, objects are all naturally

represented by tensors or multidimensional arrays, e.g., gray face image in biometrics,

colour image in scene classification, colour video shot (as shown in Figure 1) in

multimedia information management, TCP flow records in computer networks, and

DBLP bibliography in data mining [6]. Therefore, tensor based data modeling

becomes very popular and a large number of learning models have been developed

from unsupervised learning to supervised learning, e.g., high order singular value

decomposition [5] [1] [2], n-mode component analysis or tensor principal component

792

D. Tao et al.

Fig. 1. A colour video shot is a fourth order tensor. Four indices are used to locate elements.

Two indices are used for pixel locations; one index is used to locate the colour information; and

the other index is used for time. The video shot comes from http://www–nlpir.nist.gov/

projects/trecvid/.

analysis (TPCA) [5] [10] [6], three-mode data principal component analysis [4],

tucker decomposition [9], tensorface [10], generalized low rank approximations of

matrix [12], two dimensional linear discriminant analysis [11], dynamic tensor

analysis [6] and general tensor discriminant analysis [7].

However, all these tensor based learning models lack systematic justifications at

the probabilistic level. Therefore, it is impossible to conduct conventional statistical

tasks, e.g., model selection, over existing tensor based learning models. The difficulty

of applying probability theory, probabilistic inference, and probabilistic graphical

modeling to tensor based learning models comes from the gap between the tensor

based datum representation and the vector based input requirement in traditional

probabilistic models. Is it possible to apply the probability theory and to develop

probabilistic graphical models for tensors? Is it possible to apply the statistical models

over specific tensor based learning models?

To answer the above questions, we narrow down our focus on generalizing the

probabilistic principal component analysis (PPCA) [8], which is an important

generative model [3] for subspace selection or dimension reduction for vectors. This

generalization forms the probabilistic tensor analysis (PTA), which is a generative

model for tensors. The significances of PTA are as following:

1) Providing a probabilistic analysis for tensors. Based on PTA, a probabilistic

graphical model can be constructed, the dimension reduction procedure for tensors

could be understood at the probabilistic level, and the parameter estimation can be

formulated by utilizing the expectation maximization (EM) algorithm under the

maximum likelihood framework [3];

2) Providing statistical methods for model selection based on the Akaike and

Bayesian information criteria (AIC and BIC) [3]. It is impossible for conventional

tensor based subspace analysis to find a criterion for model selection, i.e., determining

the appropriate number of retained dimensions to model the original tensors. By

Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria

793

providing specific utilizations of AIC and BIC, the model selection for tensor

subspace analysis comes true. In PTA associated with AIC/BIC, the number of

retained dimensions can be chosen by an iterative procedure; and

3) Providing a flexible framework for tensor data modeling. PTA assumes entries

in tensor measurements are drawn from multivariate normal distributions. This

assumption could be changed for different applications, e.g., multinomial/binomial

distributions in sparse tensor modeling. With different assumptions, different

dimension reduction algorithm could be developed for different applications.

2 Probabilistic Tensor Analysis

In this Section, we first construct the latent tensor model which a multilinear mapping

to relate the observed tensors with unobserved latent tensors. Based on the proposed

latent tensor model, we propose the probabilistic tensor analysis with dimension

reduction and data reconstruction. The detailed description about tensor terminologies

can be found in [9].

2.1 Latent Tensor Model

Similar to the latent variable model [8], the latent tensor model multilinearly relates

high dimensional observation tensors

1

M

M

l

l

l

l

i

R

−

× × ×

∈

T

for

1

i

≤ ≤

to the

corresponding latent tensors

2

1

M

M

l

l

l

l

i

R

−

′ ′

′

× × ×

∈

X

in the low dimensional space, i.e.,

1

d

M

T

i

i

d

i

d

U

∏

T

X

M

E ,

(1)

where

d

d

l

l

M

d

d

U

R

′ ×

∈

is the

projection matrix and

1

d

M

≤ ≤

;

( )

1

n

i

i

n

∑

M

T is

the mean tensor of all observed tensors

i

T

; and

i

E

is the i

residue tensor and every

entry of

i

E

follows

(

)

2

0,

N

. Moreover, the number of effective dimension of latent

tensors is upper bounded by

1

d

−

for d

mode, i.e.,

1

d

d

l

l

′ ≤ −

for the d

projection

matrix

d

U . Here, 1

i

n

≤ ≤

and n is the number of observed tensors. Projection

matrices

d

U (1

d

M

≤ ≤

) construct the multilinear mapping between the observations

and the latent tensors.

2.2 Probabilistic Tensor Analysis

Armed with the latent tensor model and the probabilistic principal component analysis

(PPCA), a probabilistic treatment of TPCA is constructed by introducing hyper-

parameters with prior distribution

(

)

2

|

M

d

d

p

U

D

, where

{ }

n

i i

D

is the set

contains all observed tensors. According to the Bayesian theorem, with

D

the

probabilistic modeling of TPCA or PTA is defined by the predictive density, i.e.,

(

)

(

) (

)

|

M

M

d

d

d

d

p

D

p

U

p

U

D

T

T M

M

,

(2)

794

D. Tao et al.

where

1

|

d

d

l

l

M

d

d

U

R

′ ×

∈

(

)

,

M

d

d

p

U

T M

is the predictive density with the given

full probabilistic model, and

(

)

M

d

d

p

U

D

is the posterior probability. The

model parameters

(

)

,

M

d

d

U

can be obtained by applying maximum a posterior

(MAP). In PTA, to obtain

(

)

,

M

d

d

U

, we have the following concerns:

1) the probabilistic distributions are defined over vectors but not tensors. Although

the vectorization operation is helpful to utilize the conventional probability theory and

inferences, the computational cost will be very high and almost intractable for

practical applications. Based on this perspective, it is important to develop a method

to obtain the probabilistic model in a computational tractable way. In the next part,

the decoupled probabilistic model is developed to significantly reduce the

computational complexity.

2) how to determine the number of retained dimensions to model observed tensors?

In probability theory and statistics, the Akaike information criterion (AIC) [3] and the

Bayesian information criterion (BIC) [3] are popular in model selection. However,

both AIC and BIC are developed for data represented by vectors. Therefore, it is

important to generalize AIC and BIC for tensor data.

Based on the above discussions, to obtain the projection matrices

M

d

d

U

computationally intractable, because the model requires obtaining all projection

matrices simultaneously. To construct a computationally tractable algorithm to obtain

|

M

d

d

U

, we can construct decoupled probabilistic model for (1), i.e., obtain each

i

T

d

;

d j

x

;

d j

t

d

nl

d

U

1

d

−

M

1

U

1

d

U

+

M

i

X

n

Fig. 2. Decoupled probabilistic graphical model for probabilistic tensor analysis

Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria

795

projection matrix

d

U

separately and form an alternating optimization procedure. The

decoupled predictive density

(

)

M

d

d

p

U

T M

is defined as

(

)

(

)

,

M

M

d

d

d

d

d

d

d

p

U

p

U

μ σ

∝

∏

T M

T

.

(3)

where

d

l

d

R

∈

is the mean vector for the d

mode;

2

d

is the variance of the noise to

model the residue on the d

mode.

The decoupled posterior distribution with the given observed tensors D is

(

)

(

)

|

M

M

k d

d

d

d

d

d

k

k M

d

p

U

D

p

U

D U

≠

≤ ≤

∝

∏

M

.

(4)

Therefore, based on (3) and (4), the decoupled predictive density is

(

)

(

) (

)

(

)

M

k d

d

d

d

d

k

k M

d

d

d

d

p

D

p

U

p D U

U

μ σ

≠

≤ ≤

∝

∏

T

T

.

(5)

With (4) and (5), we have the decoupled probabilistic graphical model as shown in

Figure 2.

Consider the column space of the projected data set

{ }

;

d

d j

D

t

, where the sample

;

d

l

d j

t

R

∈

is the j

column of

(

)

mat

d

d

d

d

T

U

and

d

d

l

l

d

T

R

∈

, where

d

k

k d

l

l

≠

′

∏

. Therefore, 1

d

j

nl

≤ ≤

. Let

mat

k

M

d

d

k

k

X

U

⎛

⎞

⎜

⎟

⎝

⎠

∏

T

and

;

d

l

d j

x

R

′

∈

the j

column of

d

X

. Suppose,

(

)

2

T

d

d

d

d

d

d

t

x

N U x

I

μ σ

and the marginal

distribution of the latent tensors

( )

0,

d

x

N

I , then the marginal distribution of the

observed projected mode–d vector

d

t is also Gaussian, i.e.,

(

)

,

d

d

d

t

N

C

. The

mean vector and the corresponding covariance matrix are

(

)

( )

;

d

L

d

d j

d

j

t

nl

′

∑

and

2

T

d

d

d

d

C

U U

I

, respectively. Based on the property of Gaussian properties, we

have

(

)

(

)

,

d

d

d

d

d

d

d

d

x

t

N M U

t

M

μ σ

−

(6)

where

(

)

2

T

d

d

d

d

M

U U

I

In this decoupled model, the objective is to find the

projection matrix

d

U

based

on MAP with given

k d

k

k M

U

≠

≤ ≤

, i.e., calculate

d

U

by maximizing

(

)

( )

(

)

log

log det

d

d

d

d

d

d

d

nl

p U

D

U

C

C S

−

⎡

⎤

∝ −

⎣

⎦

(7)

where

( )

log a is the natural logarithm of a and

d

S is the sample covariance matrix

d

t . The eq. (7) is benefited from the decoupled definition of the probabilistic

model, defined by (5). Based on (7), the total log of posterior distribution is

796

D. Tao et al.

(

)

( )

(

)

log

log det

M

M

d

d

d

d

d

d

d

d

d

nl

L

p U

D

U

C

C S

−

⎧

⎫

⎡

⎤

∝ −

⎨

⎬

⎣

⎦

⎩

⎭

∑

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 68 69 70 71 72 73 74 75 ... 88