Lecture Notes in Computer Science

bet	71/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 67 68 69 70 71 72 73 74 ... 88

X

=

Y

Q

I

T

+

1

A

Q

J

I

1

1

J

D

Q

I

T

1

T

J

1

N

Fig. 1. General 3D PARAFAC model described by the set of matrix equations Y

=

AD

X + N

, q = 1, 2, . . . , Q, where D

is a diagonal matrix that holds on the main

diagonal the q-th row of D. In the special case of the SNTF, we impose nonnegativity

and additional constraints, and I = T = Q, A = D = G

∈ R

I

×J

, and X = G

A super-symmetric tensor is a tensor whose entries are invariant under any

permutation of the indices. For example, a third order super-symmetric tensor

∈ R

×T ×Q

(with I = T = Q) has y

itq

= y

iqt

= y

tiq

= y

tqi

= y

qit

= y

qti

Super-symmetric tensors arise naturally in multi-way clustering where they

represent generalized aﬃnity tensors, in higher order statistics, and blind source

separation. Pierre Comon [20] has shown a nice relationship between super-

symmetric tensors and polynomials. Zass and Shashua applied them to multi-

way clustering problems [6,19,7], and Hazan et al. developed some multiplicative

algorithms for the NTF [2].

We formulate the SNTF decomposition of a third order super-symmetric ten-

sor Y

∈ R

×I×I

as three identical sparse nonnegative matrices G = [g

, g

, . . . ,

J

]

∈ R

×J

with J << I according to the following factorization:

Y =

j=1

◦ g

+ N ,

(1)

where g

∈ R

is the j-th column vector of the matrix G, the operator

◦ means

outer product

, and N is a tensor representing error.

The SNTF model can be described in the equivalent matrix form as

= GD

+ N

(q = 1, 2, . . . , I)

(2)

Note that if u, v, w are vectors, then [u

◦ v ◦ w]

ijq

= u

Sparse Super Symmetric Tensor Factorization

783

where Y

= Y

:,:,q

= [y

itq

]

∈ R

×I

are (frontal) slices of the given tensor Y

∈

×I×I

, I = Q = T is the number of the (horizontal, vertical, frontal) slices, G =

ij

]

×J

∈ R

×J

is the unknown matrix (super-common factor) to be estimated,

∈ R

×J

is a diagonal matrix that holds the q-th row of G in its main diagonal,

and N

q

= N

:,:,q

∈ R

×I

is the q-th frontal slice of a tensor N

∈ R

×I×I

(not

necessary super symmetric) representing error or noise, depending upon the

application. The above algebraic system can be represented in an equivalent

scalar form as follows

itq

= z

itq

+ n

itq

j=1

+ n

itq

(3)

The objective is to estimate a sparse matrix G, subject to some constraints

like scaling to unit length vectors, non-negativity and other possible natural

constraints such as orthogonality, sparseness and/or smoothness of all or some

of the columns g

.

Throughout this paper, we use the following notation: the ij-th element of

the matrix G is denoted by g

, and its j-th column by g

, y

itq

= [Y

]

means

the it-th element of the q-th frontal slice Y

and z

itq

j=1

with

(i = 1, 2, . . . , I; t = 1, 2, . . . , I; q = 1, 2, . . . , I).

Multiplicative SNTF Algorithms

2.1

Generalized Alpha Divergence

The most widely known and often used adaptive algorithms for NTF/NMF

and also SNTF are based on alternating minimization of the squared Euclidean

distance and the generalized Kullback-Leibler divergence [15,13,9]. In this paper,

we propose to use a more general cost function: Alpha divergence.

The 3D generalized alpha- divergence can be deﬁned for our purpose as

follows [1]:

(α)

||Z)=

⎧

⎪

⎨

⎪

⎩

itq

α(α

− 1)

itq

−1

− 1 −

itq

− z

itq

α = 0, 1,

itq

− y

itq

+ z

itq

α = 1,

itq

+ y

itq

− z

itq

α = 0,

(4)

where y

itq

= [Y ]

itq

and z

itq

= [GD

]

for (i = 1, 2, . . . , I), (t = 1, 2, . . . , I),

(q = 1, 2, . . . , I).

784

A. Cichocki et al.

The choice of the parameter α

∈ R depends on the statistical distribution

of noise and data. We recall, that as special cases of the alpha-divergence for

α = 2, 0.5,

−1, we obtain the Pearson’s chi squared, Hellinger and Neyman’s chi-

square distances, respectively, while for the cases α = 1 and α = 0 the divergence

has to be deﬁned by the limits of (4 (a)) as α

→ 1 and α → 0, respectively.

When these limits are evaluated, one obtains the generalized Kullback-Leibler

divergence deﬁned by equations (4 (c)) for α

→ 1.

The gradient of the alpha divergence (4), for α = 0, can be expressed in

compact form as:

∂D

(α)

∂g

−

itq

α = 0.

(5)

However, instead of applying here the standard gradient descent, we use a pro-

jected (nonlinearly transformed) gradient approach (which can be considered as

a generalization of the exponentiated gradient):

Φ(g

ij

)

← Φ(g

)

− η

∂D

(α)

∂g

(6)

where Φ(x) is a suitable chosen function.

Hence, we have

ij

← Φ

−1

Φ(g

)

− η

∂D

(α)

∂g

(7)

It can be shown that using such nonlinear scaling or transformation provides a sta-

ble solution and the gradients are much better behaved in the Φ space. In our case,

we employ Φ(x) = x

and choose the learning rates: η

= αΦ(g

which leads to the generalized multiplicative alpha algorithm

← g

itq

)

1/α

(8)

with the normalization of the columns of G to unit length at each iteration, i.e.:

← g

p=1

. This SNTF algorithm can be considered as a generalization

of the EMML algorithm (for α = 1) proposed in [2, 6].

We may apply nonlinear projections or ﬁltering via suitable nonlinear

monotonic functions which increase or decrease the sparseness. In the simplest

case, we can apply a very simple nonlinear transformation g

← (g

)

1+α

∀k,

where α

is a small coeﬃcient, typically from 0.001 to 0.005, and it is positive

if we want to increase sparseness of an estimated component and negative if we

want to decrease the sparseness. Hence, the generalized alpha algorithm for the

SNTF with sparsity control can take the following form:

For α = 0, instead of Φ(x) = x

we have used Φ(x) = ln(x).

Sparse Super Symmetric Tensor Factorization

785

←

⎡

⎣g

itq

)

ω/α

⎤

⎦

1+α

(9)

where ω is an over-relaxation parameter (typically, in the range (0, 2)) which

controls the convergence speed, and α

is a small parameter which controls

sparsity of the estimated matrix G.

2.2

SMART Algorithm

Alternative multiplicative SNTF algorithms can be derived using the exponen-

tiated gradient (EG) descent updates instead of the standard additive gradient

descent. For example, by using the alpha divergence (4) for α = 0, we have

ij

← g

exp

−˜η

∂D

(0)

∂g

(10)

∂D

(0)

∂g

(ln z

itq

− ln y

itq

) .

(11)

Hence, we obtain the simple multiplicative learning rules:

← g

exp

itq

= g

itq

(12)

The nonnegative learning rates η

can take diﬀerent forms. Typically, in order to

guarantee stability of the algorithm we assume that η

= ˜

= ω (

t=1

)

−2

where ω

∈ (0, 2) is an over-relaxation parameter.

The above SNTF multiplicative algorithm can be considered as an alternat-

ing minimization/projection extension of the well-known SMART (Simultaneous

Multiplicative Algebraic Reconstruction Technique) [11, 21].

2.3

Generalized Beta Divergence

The generalized beta divergence can be considered as a complementary cost

function of the generalized alpha divergence and can be deﬁned as follows:

(β)

||Z) =

⎧

⎪

⎨

⎪

⎩

itq

− z

itq

−

β+1

itq

− z

β+1

itq

β + 1

β > 0,

itq

ln(

itq

) + y

itq

− z

itq

β = 0,

itq

ln(

itq

) +

itq

− 1 ,

β =

−1.

(13)

786

A. Cichocki et al.

The choice of the parameter β depends on the statistical distribution of the data

and the beta divergence corresponds to the Tweedie models [22]. For example,

the optimal choice of the parameter β for a normal distribution is β = 1, for a

gamma distribution is β =

−1, for a Poisson distribution is β = 0, and for the

compound Poisson β

∈ (−1, 0).

From the beta generalized divergence, we can derive various kinds of SNTF

algorithms: Multiplicative algorithms based on the standard gradient descent

or the Exponentiated Gradient (EG) algorithms, additive algorithms using Pro-

jected Gradient (PG) or Interior Point Gradient (IPG), quasi-Newton and Fixed

Point (FP) ALS algorithms [23, 24, 25, 26, 27, 28, 9, 13].

In order to derive the multiplicative SNTF learning algorithm for a sparse

factorization, we compute the gradient of a regularized beta divergence (13),

with the additional regularization (sparsiﬁcation) term J (G) = α

||G||

as:

∂D

(β)

Breg

∂g

itq

− y

itq

−1

itq

+ α

(14)

Applying the simple (the ﬁrst-order) gradient descent approach:

← g

− η

∂D

(β)

Breg

∂g

(15)

and by choosing suitable learning rates: η

= g

itq

, we obtain a

generalized SNTF beta algorithm:

← g

jtq

−β

itq

)

− α

itq

(16)

where [x]

= max

{ε, x} with a small ε = 10

−16

introduced to avoid zero and

negative values.

In the special case, for β = 0 the above algorithm simpliﬁes to the generalized

alternating EMML algorithm that is similar to the algorithm derived by Hazan

et al. [2, 29]:

← g

jtq

itq

)

− α

(17)

Simple Alternative Approaches for Super-Symmetric

Tensor Decomposition

3.1

Averaging Approach

For large dimensions of tensors (I >> 1), the above derived local algorithm

could be computationally very time consuming.

Sparse Super Symmetric Tensor Factorization

787

In this section, we propose an alternative simple approach which converts the

problem to a simple tri-NMF model:

Y = G ¯

+ ¯

N ,

(18)

where ¯

Y =

q=1

∈ R

×I

, ¯

D =

q=1

= diag

{ ¯

, ¯

, . . . , ¯

} and ¯

N =

q=1

∈ R

×I

The above system of linear algebraic equations can be represented in an equiv-

alent scalar form as: ¯

it

=

+ ¯

or equivalently in the vector form:

Y =

+ ¯

N where g

are columns of G.

Such a simple model is justiﬁed if noise in the frontal slices is uncorrelated.

It is interesting to note that the model can be written in the equivalent form:

Y = ˜

G ˜

+ ¯

N ,

(19)

where ˜

G = G ¯

1/2

, assuming that ¯

∈ R

×J

is a non-singular matrix. Thus, the

problem can be converted to a standard symmetric NMF problem to estimate

matrix ˜

G. Using any available NMF algorithm: Multiplicative, FP-ALS, or PG,

we can estimate the matrix ˜

For example, by minimizing the following regularized cost function:

D( ¯

|| ˜

G ˜

) =

|| ¯

− ˜

G ˜

+ α

|| ˜

(20)

and applying the FP-ALS approach, we obtain the following simple algorithm

G

← ( ¯

− α

E)( ˜

−1

(21)

subject to normalization the columns of ˜

G to unit-length in each iteration step,

where E is the matrix of all ones of appropriate size.

3.2

Row-Wise and Column-Wise Unfolding Approach

It is worth noting that the diagonal matrices D

are scaling matrices that can be

absorbed by the matrix G. By deﬁning the column-normalized matrices G

, we can use the following simpliﬁed models:

= G

+ N

(q = 1, . . . , Q)

(22)

or equivalently

= GG

+ N

(q = 1, . . . , Q).

(23)

These simpliﬁed models can be described by a single compact matrix equation

using column-wise or row-wise unfolding as follows

c

= G

(24)

788

A. Cichocki et al.

Y

r

= GG

(25)

where Y

= Y

= [Y

; Y

; . . . ; Y

]

∈ R

×I

is the column-wise unfolded

matrix of the slices Y

and G

= G

= [G

; G

; . . . ; G

]

∈ R

J I

×I

is column-

wise unfolded matrix of the matrices G

= GD

(q = 1, 2, , . . . , I).

Using any eﬃcient NMF algorithm (multiplicative, IPN, quasi-Newton, or

FP-ALS) [23, 24, 25, 26, 27, 28, 9, 13], we can estimate the matrix G. For example,

by minimizing the following cost function:

D(Y

||G

) =

||Y

− G

+ α

||G||

(26)

and applying the FP-ALS approach, we obtain the following iterative algorithm:

← ([Y

− α

)(G

)

−1

(27)

or equivalently

← ([Y

− α

)(G

)

−1

(28)

where G

= G

= [GD

; GD

; . . . ; GD

], D

= diag

} and g

means q-th

row of G.

3.3

Semi-orthogonality Constraint

The matrix G is usually very sparse and additionally satisﬁes orthogonality

constraints. We can easily impose orthogonality constraint by incorporating ad-

ditionally the following iterations:

← G G

−1/2

(29)

3.4

Simulation Results

All the NTF algorithms presented in this paper have been tested for many

diﬃcult benchmarks for signals and images with various statistical distributions

of signals and additive noise. Comparison and simulation results will be presented

in the ICONIP-2007.

Conclusions and Discussion

We have proposed the generalized and ﬂexible cost function (controlled by spar-

sity penalty/regularization terms) that allows us to derive a family of SNTF

algorithms. The main objective and motivations of this paper is to derive sim-

ple multiplicative algorithms which are especially suitable both for very sparse

Sparse Super Symmetric Tensor Factorization

789

representation and highly over-determined cases. The basic advantage of the

multiplicative algorithms is their simplicity and relatively straightforward gen-

eralization to L-order tensors (L > 3). However, the multiplicative algorithms

are relatively slow.

We found that simple approaches which convert a SNTF problem to a sym-

metric NMF (SNMF) or symmetric tri-NMF (ST-NMF) problem provide the

more eﬃcient and fast algorithms, especially for large scale problems. Moreover,

by imposing orthogonality constraints, we can drastically improve performance,

especially for noisy data.

Obviously, there are many challenging open issues remaining, such as global

convergence and an optimal choice of the associated parameters.

References

1. Amari, S.: Diﬀerential-Geometrical Methods in Statistics. Springer, Heidelberg

(1985)

2. Hazan, T., Polak, S., Shashua, A.: Sparse image coding using a 3D non-negative

tensor factorization. In: International Conference of Computer Vision (ICCV), pp.

50–57 (2005)

3. Workshop on tensor decompositions and applications, CIRM, Marseille, France

(2005)

4. Heiler, M., Schnoerr, C.: Controlling sparseness in non-negative tensor factoriza-

tion. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951,

pp. 56–67. Springer, Heidelberg (2006)

5. Smilde, A., Bro, R., Geladi, P.: Multi-way Analysis: Applications in the Chemical

Sciences. John Wiley and Sons, New York (2004)

6. Shashua, A., Zass, R., Hazan, T.: Multi-way clustering using super-symmetric non-

negative tensor factorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV

2006. LNCS, vol. 3954, pp. 595–608. Springer, Heidelberg (2006)

7. Zass, R., Shashua, A.: A unifying approach to hard and probabilistic clustering.

In: International Conference on Computer Vision (ICCV), Beijing, China (2005)

8. Sun, J., Tao, D., Faloutsos, C.: Beyond streams and graphs: dynamic tensor analy-

sis. In: Proc.of the 12th ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining (2006)

9. Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.: Algorithms and

applications for approximate nonnegative matrix factorization. In: Computational

Statistics and Data Analysis (in press, 2006)

10. Cichocki, A., Zdunek, R., Amari, S.: Csiszar’s divergences for non-negative matrix

factorization: Family of new algorithms. In: Rosca, J.P., Erdogmus, D., Pr´ıncipe,

J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 32–39. Springer, Heidelberg

(2006)

11. Cichocki, A., Amari, S., Zdunek, R., Kompass, R., Hori, G., He, Z.: Extended

SMART algorithms for non-negative matrix factorization. In: Rutkowski, L.,

Tadeusiewicz, R., Zadeh, L.A., ˙Zurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI),

vol. 4029, pp. 548–562. Springer, Heidelberg (2006)

12. Cichocki, A., Zdunek, R.: NTFLAB for Signal Processing. Technical report, Labo-

ratory for Advanced Brain Signal Processing, BSI, RIKEN, Saitama, Japan (2006)

790

A. Cichocki et al.

13. Dhillon, I., Sra, S.: Generalized nonnegative matrix approximations with Bregman

divergences. In: Neural Information Proc. Systems, Vancouver, Canada, pp. 283–

290 (2005)

14. Kim, M., Choi, S.: Monaural music source separation: Nonnegativity, sparseness,

and shift-invariance. In: Rosca, J.P., Erdogmus, D., Pr´ıncipe, J.C., Haykin, S. (eds.)

ICA 2006. LNCS, vol. 3889, pp. 617–624. Springer, Heidelberg (2006)

15. Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix fac-

torization. Nature 401, 788–791 (1999)

16. Mørup, M., Hansen, L.K., Herrmann, C.S., Parnas, J., Arnfred, S.M.: Parallel

factor analysis as an exploratory tool for wavelet transformed event-related EEG.

NeuroImage 29, 938–947 (2006)

17. Miwakeichi, F., Martinez-Montes, E., Valds-Sosa, P., Nishiyama, N., Mizuhara, H.,

Yamaguchi, Y.: Decomposing EEG data into space

−time−frequency components

using parallel factor analysi. NeuroImage 22, 1035–1045 (2004)

18. Zass, R., Shashua, A.: Nonnegative sparse pca. In: Neural Information Processing

Systems (NIPS), Vancuver, Canada (2006)

19. Zass, R., Shashua, A.: Doubly stochastic normalization for spectral clustering. In:

Neural Information Processing Systems (NIPS), Vancuver, Canada (2006)

20. Comon, P.: Tensor decompositions-state of the art and applications. In: McWhirter,

J.G., Proudler, I.K. (eds.) Institute of Mathematics and its Applications Conference

on Mathematics in Signal Processing, pp. 18–20. Clarendon Press, Oxford, UK

(2001)

21. Byrne, C.L.: Choosing parameters in block-iterative or ordered-subset reconstruc-

tion algorithms. IEEE Transactions on Image Processing 14, 321–327 (2005)

22. Minami, M., Eguchi, S.: Robust blind source separation by Beta-divergence. Neural

Computation 14, 1859–1886 (2002)

23. Cichocki, A., Zdunek, R., Choi, S., Plemmons, R., Amari, S.I.: Novel multi-layer

nonnegative tensor factorization with sparsity constraints. In: Beliczynski, B.,

Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4432,

pp. 271–280. Springer, Heidelberg (2007)

24. Cichocki, A., Zdunek, R.: Regularized alternating least squares algorithms for non-

negative matrix/tensor factorizations. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun,

C. (eds.) ISNN 2007. LNCS, vol. 4493, pp. 793–802. Springer, Heidelberg (2007)

25. Cichocki, A., Zdunek, R., Amari, S.: New algorithms for non-negative matrix fac-

torization in applications to blind source separation. In: Proc. IEEE International

Conference on Acoustics, Speech, and Signal Processing, ICASSP2006, Toulouse,

France, vol. 5, pp. 621–624 (2006)

26. Cichocki, A., Zdunek, R., Choi, S., Plemmons, R., Amari, S.: Nonnegative ten-

sor factorization using Alpha and Beta divergencies. In: Proc. IEEE International

Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Honolulu,

Hawaii, USA, vol. III, pp. 1393–1396 (2007)

27. Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with constrained

second-order optimization. Signal Processing 87, 1904–1916 (2007)

28. Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with quadratic pro-

gramming. Neurocomputing (accepted, 2007)

29. Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to

statistics and computer vision. In: Proc. of the 22-th International Conference on

Machine Learning, Bonn, Germany (2005)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 791–801, 2008.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 67 68 69 70 71 72 73 74 ... 88