Lecture Notes in Computer Science

Variable Selection for Multivariate Time Series

bet	37/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 33 34 35 36 37 38 39 40 ... 88

Variable Selection for Multivariate Time Series

Prediction with Neural Networks

Min Han and Ru Wei

School of Electronic and Information Engineering,

Dalian University of Technology, Dalian 116023, China

minhan@dlut.edu.cn

Abstract. This paper proposes a variable selection algorithm based on neural

networks for multivariate time series prediction. Sensitivity analysis of the neu-

ral network error function with respect to the input is developed to quantify the

saliency of each input variables. Then the input nodes with low sensitivity are

pruned along with their connections, which represents to delete the correspond-

ing redundant variables. The proposed algorithm is tested on both computer-

generated time series and practical observations. Experiment results show that

the algorithm proposed outperformed other variable selection method by

achieving a more significant reduction in the training data size and higher pre-

diction accuracy.

Keywords: Variable selection, neural network pruning, sensitivity, multivariate

prediction.

1 Introduction

Nonlinear and chaotic time series prediction is a practical technique which can be

used for studying the characteristics of complicated dynamics from measurements.

Usually, multivariate variables are required since the output may depend not only on

its own previous values but also on the past values of other variables. However, we

can’t make sure that all of the variables are equally important. Some of them may be

redundant or even irrelevant. If these unnecessary input variables are included into the

prediction model, the parameter estimation process will be more difficult, and the

overall results may be poorer than if only the required inputs are used [1]. Variable

selection is such a problem to discard the redundant variables, which will reduce the

number of input variables and the complexity of the prediction model.

A number of variable selection methods based on statistical or heuristics tools have

been proposed, such as Principal Component Analysis (PCA) and Discriminant

Analysis. These techniques attempt to reduce the dimensionality of the data by creat-

ing new variables that are linear combinations of the original ones. The major diffi-

culty comes from the separation of variable selection process and prediction process.

Therefore, variable selection using neural network is attractive since one can globally

adapt the variable selector together with the predictor.

Variable selection with neural network can be seen as a special case of architecture

pruning [2], where the pruning of input nodes is equivalent to removing the corresponding

416

M. Han and R. Wei

variables from the original data set. One approach to pruning is to estimate the sensitivity

of the output to the exclusion of each unit. There are several ways to perform sensitivity

analysis with neural network. Most of them are weight-based [3], which is based on the

idea that weights connected to important variables attain large absolute values while

weights connected to unimportant variables would probably attain values somewhere near

zero. However, smaller weights usually result in smaller inputs to neurons and larger sig-

moid derivatives in general, which will increase the output sensitivity to the input. Mozer

and Smolensky [4] have introduced a method which estimates which units are least impor-

tant and can be deleted over training. Gevrey et al. [5] compute the partial derivatives of

the neural network output with respect to the input neurons and compare performances of

several different methods to evaluate the relative contribution of the input variables.

This paper concentrates on a neural-network-based variable selection algorithm as

the tool to determine which variables are to be discarded. A simple sensitivity crite-

rion of the neural network error function with respect to each input is developed to

quantify the saliency of each input variables. Then the input nodes are arrayed by a

decreasing sensitivity order so that the neural network can be pruned efficiently by

discarding the last items with low sensitivity. The variable selection algorithm is then

applied to both computer-generated data and practical observations and is compared

with the PCA variable reduction method.

The rest of this paper is organized as follows. Section 2 reviews the basic concept of

multivariate time series prediction and a statistical variable selection method. Section 3

explains the sensitivity analysis with neural networks in detail. Section 4 presents two

simulation results. The work is finally concluded in section 5.

2 Modeling Multivariate Chaotic Time Series

The basic idea of chaotic time series analysis is that, a complex system can be de-

scribed by a strange attractor in its phase space. Therefore, the reconstruction of the

equivalent state space is usually the first step in chaotic time series prediction.

2.1 Multivariate Phase Space Reconstruction

Phase space reconstruction from observations can be accomplished by choosing a

suitable embedding dimension and time delay. Given an M-dimensional time se-

ries{X

, i=1, 2,…, M}, where X

=[x

(1), x

(2), …, x

(N)]

, N is the length of each scalar

time series. As in the case of univariate time series (where M=1), the reconstructed

phase-space can be made as [6]:

1

1

( )

[ ( ), (

(

1) ),

( ),

(

]

M

M

M

M

M

M

X t

x t x t

x t

d

x

t x

t

x

t

d

−

(1)

where

,

t

L L

N

max(

1

i

i

i M

L

d

≤ ≤

− ⋅ +

, τ

i

and d

(

1, 2,

,

i

) are the time

delays and embedding dimensions of each time series, respectively. The delay time τ

i

can be calculated using mutual information method and the embedding dimension is

computed with the false nearest neighbor method.

Variable Selection for Multivariate Time Series Prediction with Neural Networks

417

According to Takens’ embedding theorem, if

1

M

i

i

D

d

∑

is large enough there exist

an mapping F: X(t+1)=F{X(t)}. Then the evolvement of X(t)→X(t+1)

reflects the

evolvement of the original dynamics system. The problem is then to find an appropri-

ate expression for the nonlinear mapping F.

Up to the present, many chaotic time series prediction models have been devel-

oped. Neural network has been widely used because of its universal approximation

capabilities.

2.2 Neural Network Model

A multilayer perceptron (MLP) with a back propagation (BP) algorithm is used as a

nonlinear predictor for multivariate chaotic time series prediction. MLP is a super-

vised learning algorithm designed to minimize the mean square error between the

computed output of the neural network and the desired output. The network usually

consists of three layers: an input layer, one or more hidden layers and an output layer.

Consider a three layer MLP that contains one hidden layer. The D dimensional de-

layed time series X(t) are used as the input of the network to generate the network

output X(t+1). Then the neural network can be expressed as follows:

(I)

(

)

N

j

i

ij

i

o

f

x w

∑

(2)

(O)

1

N

k

jk

j

j

y

w

o

∑

(3)

where

[ ,

]

( )

N

x x

x

X t

denotes the input signal,

I

N

is number of input signal

to the neural network,

(I)

ij

w

is the weight connected from the

th input neuron to the

th hidden neuron,

o

j

are the output of the

th hidden neuron,

H

N

is the number of

neurons in the hidden layer,

[ ,

]

(

1)

N

y y

y

X t

is the output,

O

N

is the num-

ber of output neurons and

(O)

jk

w

is the weight connected from the

th hidden neuron

and the

k

th output neuron.

The activation function

f

(·) is the sigmoid function given by

( )

1 exp(

)

f x

−

(4)

The error function of the net is usually defined as the sum square of the error

[

( )

( )]

N

N

k

k

t

k

E

y t

p t

−

∑∑

,

t

=1,2,…

(5)

where

p

k

(

t

) is the desired output for unit

k

,

N

is the length of the training sample.

418

M. Han and R. Wei

2.3 Statistical Variable Selection Method

For the multivariate time series, the dimension of the reconstructed phase space is

usually very high. Moreover, the increase of the input variable numbers will lead to

the high complexity of the prediction model. Therefore, in many practical applica-

tions, variable selection is needed to reduce the dimensionality of the input data. The

aim of variable selection in this paper is to select a subset of

inputs that retains most

of the important features of the original input sets. Thus,

D

-

R

irrelevant inputs are

discarded.

The Principle Component Analysis (PCA) is a traditional technique for variable se-

lection [7]. PCA attempts to reduce the dimensionality by first decomposing the nor-

malize input vector

X

(

t

) d with singular value decomposition (SVD) method

T

X

U

V

∑

(6)

where

[

...

0 ... 0]

p

diag s s

s

∑

p

s

s

s

≥

are the first p eigenvalues of X ar-

rayed by a decreasing order, U and V are both orthogonal matrixes.

Then the first k singular values are preserved as the principle components. The fi-

nal input can be obtained as

T

Z

U X

(7)

where U is the first k rows of U.

PCA is an efficient method to reduce the input dimension. However, we can’t make

sure that the factors we discard have no influence to the prediction output because the

variable selection and prediction process are separated individually. Neural network

selector is a good choice to combine the selection process and prediction process.

3 Sensitivity Analysis with Neural Networks

Variable selection with neural networks can be achieved by pruning the input nodes

of a neural network model based on some saliency measure aiming to remove less

relevant variables. The significance of a variable can be defined as the error when the

unit is removed minus the error when it is left in place:

WithoutUnit _

WithUnit _

(

)

i

i

i

i

i

i

S

E

E

E x

E x

x

−

= −

(8)

where E is the error function defined in Eq.(5).

After the neural network has been trained, a brute-force pruning method for ever

input is setting the input to zero and evaluate the change in the error. If it increases

too much, the input is restored, otherwise it is removed. Theoretically, this can be

done by training the network under all possible subsets of the input set. However, this

exhaustive search is computational infeasible and can be very slow for large network.

This paper uses the same idea with Mozer and Smolensky [4] to approximate the

sensitivity by introducing a gating term

for each unit such that

(

)

j

ij

i

i

i

o

f

w

o

∑

(9)

where

j

o is the activity of unit j,

ij

w is the weight from unit i to unit j.

Variable Selection for Multivariate Time Series Prediction with Neural Networks

419

The gating term

is shown in Fig.1, where

1, 2,

,

I

i

I

i

N

is the gating term of

the ith input neuron and

1, 2,

,

H

j

H

j

N

is the gating term of the jth output neuron.

1

I

1

H

1

x

i

x

I

N

x

k

y

I

I

N

α

H

H

N

Fig. 1. The gating term for each unit

The gating term

is merely a notational convenience rather than a parameter that

must be implied in the net. If

α =

, the unit has no influence on the network; If

α =

, the unit behaves normally.

The importance of a unit is then approximated by the derivative

1

i

i

i

E

S

∂

= −

∂

(10)

By using a standard error back-propagation algorithm, the derivative of Eq.(9) can

be expressed in term of network weights as follows

(

)

( )

( )

O

H

N

N

N

H

O

k

j

k

k

jk

j

H

H

t

k

j

k

j

j

y

E

E

S

p t

y t

w

o

y

⎡

⎤

∂

= −

⋅

−

⎢

⎥

∂

⎣

⎦

∑∑

∑

(11)

(

)

( )

)

( )

O

H

N

N

N

I

O

I

k

i

k

k

jk

j

j

ij

i

I

I

t

k

j

k

i

i

y

E

E

S

p t

y t

w

o

o w x t

y

⎡

⎤

∂

= −

⋅

−

⎢

⎥

∂

⎣

⎦

∑∑

∑

(12)

where

I

i

S

is the sensitivity of the ith input neuron,

H

j

S

is the sensitivity of the jth

output neuron. Thus the algorithm can prune the input nodes as well as the hidden

nodes according to the sensitivity over training.

However, the undulation is high when the sensitivity is calculated directly using

Eq.(11) and Eq.(12) because of the engineering approximation in Eq.(10). Sometimes,

it may delete the input incorrectly. In order to possibly reduce the dimensionality of

input vectors, the sensitivity matrix needs to be evaluated over the entire training set.

This paper develops several ways to define the overall sensitivity such as:

(1) The mean square average sensitivity:

,

1

( )

N

i avg

i

t

S

S t

T

∑

(13)

where T is the number of data in the training set.

420

M. Han and R. Wei

(2) The absolute value average sensitivity:

( )

N

i abs

i

t

S

S t

T

∑

(14)

(3) The maximum absolute sensitivity:

,max

max

( )

i

i

t N

S

S t

≤ ≤

(15)

Any of the sensitivity measure in Eqs.(13)~(15) can provide a useful criterion to

determine which input is to be deleted. For succinctness, this paper uses the mean

square average sensitivity as an example. An input with a low sensitivity has little or

no influence on the prediction accuracy and can therefore be removed.

In order to get a more efficient criterion for pruning inputs, the sensitivity is nor-

malized. Define the absolute sum of the sensitivity for all the input nodes

1

I

N

i

i

S

S

∑

(16)

Then the normalized sensitivity of each unit can be defined as

ˆ

i

i

S

S

S

(17)

where the normalized value ˆ

i

S is between [0 1].

The input variables is then arrayed by a decreasing sensitivity order:

2

ˆ

ˆ

I

N

S

S

S

≥

(18)

The larger values of ˆ ( ),

1, 2,

,

i

I

S t i

N

present the important variables. Define the

sum the first k terms of the sensitivity

k

ˆ

k

k

j

j

S

∑

(19)

where k=1,2,…,N

I

.

Choosing a threshold value

1

0

0

<

<

0

k

η

>

, the first k values are preserved

as the principal components and the last term of the inputs with low sensitivity are

removed. The number of variable remained is increasing as the threshold

increase.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 33 34 35 36 37 38 39 40 ... 88