Lecture Notes in Computer Science

bet	38/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 34 35 36 37 38 39 40 41 ... 88

4.1 Prediction of Lorenz Time Series
Fig. 2.
4.2 Prediction of the Rainfall Time Series
Fig. 5.
Fig. 7.
5 Conclusions
Acknowledgements.

4 Simulations

In this section, two simulations are carried out on both computer-generated data and

practical observed data to demonstrate the performance of the variable selection

method proposed in this paper. Then the simulation results are compared with the

Variable Selection for Multivariate Time Series Prediction with Neural Networks

421

PCA method. The prediction performance can be evaluated by two error evaluation

criteria [8]: the Root Mean Square Error E

RMSE

and Prediction Accuracy E

[

]

RMSE

( )

1

N

t

E

P t

O t

N

⎛

⎞

−

⎜

⎟

−

⎝

⎠

∑

(20)

[

]

( ( )

)( ( )

)

(

1)

N

m

m

t

P

O

P t

P

O t

O

E

N

σ σ

−

∑

(21)

where O(t) is the target value, P(t) is the predicted value, O

m

is the mean value of

O(t),

σ

O

is the standard deviation of y(t), P

and

σ

P

are the mean value and standard

deviation of P(t), respectively. E

RMSE

reflects the absolute deviation between the pre-

dicted value and the observed value while E

denotes the correlation coefficient

between the observed and predicted value. In ideal situation, if there are no errors in

prediction, these parameters will be E

RMSE

=0 and E

=1.

4.1 Prediction of Lorenz Time Series

The first data is derived from the Lorenz system, given by three differential equations:

(

)

d ( )

( )

d ( )

( )

( ) ( )

d ( )

( ) ( )

d

x t

a

x t

y t

t

y t

bx t

y t

x t z t

t

z t

x t y t

c t z t

t

⎧

= −

⎪

−

⎨

⎪

−

⎪⎩

(22)

where the typical values for the coefficients are a=10, b=8/3, c=28 and the initial

values are x(0)=12, y(0)=2, z(0)=9. 1500 points of x(t), y(t) and z(t) obtained by four-

order Runge-Kutta method are used as the training sample and 500 points as the test-

ing sample.

In order to extract the dynamics of this system to predict x(t+1), the parameters for

phase-space reconstruction are chosen as

τ

x

=

τ

=3, m

=9. Thus a MLP neu-

ral network with 27 input nodes, one hidden layer of 20 neurons and one output node

are considered and a back propagation training algorithm is used.

After the training process of the MLP neural network is topped, sensitivity analysis

is carried out to evaluate the contribution of each input variable to the error function

of the neural network. The trajectories of the sensitivity through training for each

input are shown in Fig.2. It can be seen that the sensitivity undulates through training

and finally converges when the weights and error are steady. The normalized sensitiv-

ity measures in Eq.(17) are calculated. A threshold

0.98

is chosen to determine

which inputs are discarded. Thus the input dimension of neural network is reduced to

11. The original input matrix is replaced by the reduced input matrix and the structure

of the neural networks is simplified. The prediction performance over the testing

samples with the reduced inputs is shown in Fig.4.

422

M. Han and R. Wei

2000

4000

6000

8000

10000

Epoch

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Input Nodes

Fig. 2. The trajectories of the input sensitivity

through training

Fig. 3. The normalized sensitivity for each input

node

-20

-10

100

200

300

400

500

-1

-0.5

0.5

Time

x(

t)

Ovserved

Predicted

Fig. 4. The observed and predicted values of Lorenz x(t) time series

The solid line in Fig. 4 represents the observed values while the dashed line repre-

sents the predicted values. It can be seen from Fig.4 that the chaotic behaviors of x(t)

time series are well predicted and the errors between the observed values and the

predicted values are small. The prediction performance are calculated in Table 1 and

compared with the PCA variable reduction method.

Table 1. Prediction performance of the x(t) time series

With All Variables

PCA Selection

NN Selection

Input Nodes

RMSE

0.1278 0.1979

0.0630

0.9998 0.9997

1.0000

The prediction performance in Table 1 are comparable for the variable selection

method with neural networks and the PCA method while the algorithm proposed in

this paper obtains the best prediction accuracy.

Variable Selection for Multivariate Time Series Prediction with Neural Networks

423

4.2 Prediction of the Rainfall Time Series

Rainfall is an important variable in hydrological systems. The chaotic characteristic of

the rainfall time series has been proven in many papers [9]. In this section, the simula-

tion is taken on the monthly rainfall time series in the city of Dalian, China over a

period of 660 months (from 1951 to 2005). The performance of the rainfall may be

influenced by many factors, so in this paper five other time series such as the tem-

perature time series, air-pressure time series, humidity time series, wind-speed time

series and sunlight time series are also considered.

This method also follows the Taken’s theorem to reconstruct the embedding phase

space first with the dimension and delay-time as m

=9,

=3. Then the input of the neural network contains L=660

)×

3=636

data points. In the experiments, this data set is divided into a training set composed of

the first 436 points and a testing set containing the remaining 200 points.

The neural network used in this paper then contains 54 input nodes, 20 hidden

notes and 1 output. The threshold is also chosen as

0.98

. The trajectory of the

input sensitivity and the normalized sensitivity for ever inputs are shown in Fig.5 and

Fig.6, respectively. Then 34 input nodes are remained according to the sensitivity

value.

2000

4000

6000

8000

10000

Epoch

0.04

0.08

0.12

0.16

0.2

Input Nodes

rm

al

Fig. 5. The trajectories of the input sensitiv-

ity through training

Fig. 6. The normalized sensitivity for each

input node

The observed and predicted values of rainfall time series are shown in Fig.7, which

gives high prediction accuracy.

It can be seen from the figures that the chaotic behaviors of the rainfall time series

are well predicted and the errors between the observed values and the predicted values

are small. Corresponding values of E

RMSE

and E

PA

are shown in Table 2.

Both of the figures and the error evaluation criteria indicate that the result for mul-

tivariate chaotic time series using the neural network based variable selection is much

better than the results with all variables and PCA method.

It can be concluded from the two simulations that the variable selection algorithm

using neural networks is able to capture the dynamics of both computer-generated and

practical time series accurately and gives high prediction accuracy.

424

M. Han and R. Wei

100

200

300

400

120

160

200

-200

-100

100

200

t (month)

ror(m

)

Observed

Predicted

)

Fig. 7. The observed and predicted values of rainfall time series

Table 2. Prediction performance of the rainfall time series

With All Variables

PCA Selection

NN Selection

Input Nodes

RMSE

22.2189 21.0756

18.1435

0.9217 0.9286

0.9529

5 Conclusions

This paper studies the variable selection algorithm using the sensitivity for pruning input

nodes in a neural network model. A simple and effective criterion for identifying input

nodes to be removed is also derived which does not require high computational cost and

proves to work well in practice. The validity of the method was examined through a

multivariate prediction problem and a comparison study was made with other variable

selection methods. Experimental results encourage the application of the proposed

method to complex tasks that need to identify significant input variables.

Acknowledgements.

This research is supported by the project (60674073) of the

National Nature Science Foundation of China, the project (2006CB403405) of the

National Basic Research Program of China (973 Program) and the project

(2006BAB14B05) of the National Key Technology R&D Program of China. All of

these supports are appreciated.

References

[1] Verikas, B.M.: Feature selection with neural networks. Pattern Recognition Letters 23,

1323–1335 (2002)

[2] Castellano, G., Fanelli, A.M.: Variable selection using neural network models. Neural-

computing 31, 1–13 (2000)

Variable Selection for Multivariate Time Series Prediction with Neural Networks

425

[3] Castellano, G., Fanelli, A.M., Pelillo, M.: An iterative method for pruning feed-forward

neural networks. IEEE Trans. Neural Networks 8(3), 519–531 (1997)

[4] Mozer, M.C., Smolensky, P.: Skeletonization: a technique for trimming the fat from a net-

work via a relevance assessment. NIPS 1, 107–115 (1989)

[5] Gevrey, M., Dimopoulos, I., Lek, S.: Review and comparison of methods to study the con-

tribution of variables in artificial neural network models. Ecol. Model. 160, 249–264

(2003)

[6] Cao, L.Y., Mees, A., Judd, K.: Dynamics from multivariate time series. Physica D 121,

75–88 (1998)

[7] Han, M., Fan, M., Xi, J.: Study of Nonlinear Multivariate Time Series Prediction Based on

Neural Networks. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3497, pp.

618–623. Springer, Heidelberg (2005)

[8] Chen, J.L., Islam, S., Biswas, P.: Nonlinear dynamics of hourly ozone concentrations:

nonparametric short term prediction. Atmospheric environment 32(11), 1839–1848 (1998)

[9] Liu, D.L., Scott, B.J.: Estimation of solar radiation in Australia from rainfall and tempera-

ture observations. Agricultural and Forest Meteorology 106(1), 41–59 (2001)

Ordering Process of Self-Organizing Maps

Improved by Asymmetric Neighborhood

Function

Takaaki Aoki

, Kaiichiro Ota

, Koji Kurata

, and Toshio Aoyagi

CREST, JST, Kyoto 606-8501, Japan

Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan

Faculty of Engineering, University of the Ryukyus, Okinawa 903-0213, Japan

aoki@acs.i.kyoto-u.ac.jp

Abstract. The Self-Organizing Map (SOM) is an unsupervised learning

method based on the neural computation, which has recently found wide

applications. However, the learning process sometime takes multi-stable

states, within which the map is trapped to a undesirable disordered state

including topological defects on the map. These topological defects crit-

ically aggravate the performance of the SOM. In order to overcome this

problem, we propose to introduce an asymmetric neighborhood function

for the SOM algorithm. Compared with the conventional symmetric one,

the asymmetric neighborhood function accelerates the ordering process

even in the presence of the defect. However, this asymmetry tends to gen-

erate a distorted map. This can be suppressed by an improved method

of the asymmetric neighborhood function. In the case of one-dimensional

SOM, it found that the required steps for perfect ordering is numerically

shown to be reduced from

O(N

) to

O(N

Keywords: Self-Organizing Map, Asymmetric Neighborhood Function,

Fast ordering.

Introduction

The Self-Organizing Map (SOM) is an unsupervised learning method of a type

of nonlinear principal component analysis [1]. Historically, it was proposed as a

simpliﬁed neural network model having some essential properties to reproduce

topographic representations observed in the brain [2,3,4,5]. The SOM algorithm

can be used to construct an ordered mapping from input stimulus data onto

two-dimensional array of neurons according to the topological relationships be-

tween various characters of the stimulus. This implies that the SOM algorithm is

capable of extracting the essential information from complicated data. From the

viewpoint of applied information processing, the SOM algorithm can be regarded

as a generalized, nonlinear type of principal component analysis and has proven

valuable in the ﬁelds of visualization, compression and data mining. With based

on the biological simple learning rule, this algorithm behaves as an unsupervised

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 426–435, 2008.

c Springer-Verlag Berlin Heidelberg 2008

Ordering Process of SOMs Improved by Asymmetric Neighborhood Function

427

0.5

1

A

0.5

1

Reference Vector

m

i

Unit Number i

B

Fig. 1. A: An example of a topological defect in a two-dimensional array of SOM with

a uniform rectangular input space. The triangle point indicates the conﬂicting point in

the feature map. B: Another example of topological defect in a one-dimensional array

with scalar input data. The triangle points also indicate the conﬂicting points.

learning method and provides a robust performance without a delicate tuning

of learning conditions.

However, there is a serious problem of multi-stability or meta-stability in the

learning process [6,7,8]. When the learning process is trapped to these states,

the map seems to be converged to the ﬁnal state practically. However, some

of theses states are undesirable for the solution of the learning procedure, in

which typically the map has topological defects as shown in Fig. 1A. The map

in Fig. 1A, is twisted with a topological defect at the center. In this situation,

two-dimensional array of SOM should be arranged in the square space, for the

input data taken uniformly from square space. But, this topological defect is a

global conﬂicting point which is diﬃcult to remove by local modulations of the

reference vectors of units. Therefore, it will require a sheer number of learning

steps to rectify the topological defect. Thus, the existence of the topological

defect critically aggravates the performance of the SOM algorithm.

To avoid the emergence of the topological defect, several conventional and

empirical methods have been used. However, it is more favorable that the SOM

algorithm works well without tuning any model parameters, even when the topo-

logical defect emerged. Thus, let us consider a simple method which enables the

eﬀective ordering procedure of SOM in the presence of the topological defect.

Therefore we propose an asymmetric neighborhood function which eﬀectively

removes the topological defect [9]. In the process of removing the topological de-

fect, the conﬂicting point must be moved out toward the boundary of the arrays

and vanished. Therefore, the motive process of the defect is essential for the ef-

ﬁciency of the ordering process. With the original symmetric neighborhood, the

movement of the defect is similar to a random walk stochastic process, whose eﬃ-

ciency is worse. By introducing the asymmetry of the neighborhood function, the

movement behaves like a drift, which enables the faster ordering. For this reason,

in this paper we investigate the eﬀect of an asymmetric neighborhood function

on the performance of the SOM algorithm for the case of one-dimensional and

two-dimensional SOMs.

428

T. Aoki et al.

Methods

2.1

SOM

The SOM constructs a mapping from the input data space to the array of nodes,

we call the ‘feature map’. To each node i, a parametric ‘reference vector’ m

is

assigned. Through SOM learning, these reference vectors are rearranged accord-

ing to the following iterative procedure. An input vector x(t) is presented at

each time step t, and the best matching unit whose reference vector is closest

to the given input vector x(t) is chosen. The best matching unit c, called the

‘winner’ is given by c = arg min

x(t) − m

. In other words, the data x(t) in

the input data space is mapped on to the node c associated with the reference

vector m

closest to x(t). In SOM learning, the update rule for reference vectors

is given by

(t + 1) = m

(t) + α · h(r

)[x(t) − m

(t)],

≡ r

− r

(1)

where α, the learning rate, is some small constant. The function h(r) is called the

‘neighborhood function’, in which r

is the distance from the position r

of the

winner node c to the position r

of a node i on the array of units. A widely used

neighborhood function is the Gaussian function deﬁned by, h(r

) = exp

−

We expect an ordered mapping after iterating the above procedure a suﬃcient

number of times.

2.2

Asymmetric Neighborhood Function

We now introduce a method to transform any given symmetric neighborhood

function to an asymmetric one (Fig. 2A). Let us deﬁne an asymmetry parameter

β (β ≥ 1), representing the degree of asymmetry and the unit vector k indicating

the direction of asymmetry. If a unit i is located on the positive direction with k,

then the component parallel to k of the distance from the winner to the unit is

scaled by 1/β. If a unit i is located on the negative direction with k, the parallel

component of the distance is scaled by β. Hence, the asymmetric function h

(r),

transformed from its symmetric counterpart h(r), is described by

) = 2

+ β

−1

· h(˜r

⎧

⎪

⎨

⎪

⎩

+ r

⊥

· k ≥ 0)

βr

+ r

⊥

· k < 0)

(2)

where ˜

is the scaled distance from the winner. r is the projected component

of r

, and r

⊥

are the remaining components perpendicular to k, respectively.

In addition, in order to single out the eﬀect of asymmetry, the overall area of

the neighborhood function,

∞

−∞

h(r)dr, is preserved under this transformation.

In the special case of the asymmetry parameter β = 1, h

(r) is equal to the

original symmetric function h(r). Figure 2B displays an example of asymmetric

Gaussian neighborhood functions in the two-dimensional array of SOM.

Ordering Process of SOMs Improved by Asymmetric Neighborhood Function

429

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 34 35 36 37 38 39 40 41 ... 88