Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 4.1 Prediction of Lorenz Time Series
- Fig. 2.
- 4.2 Prediction of the Rainfall Time Series
- Fig. 5.
- Fig. 7.
- 5 Conclusions
- Acknowledgements.
4 Simulations In this section, two simulations are carried out on both computer-generated data and practical observed data to demonstrate the performance of the variable selection method proposed in this paper. Then the simulation results are compared with the Variable Selection for Multivariate Time Series Prediction with Neural Networks 421 PCA method. The prediction performance can be evaluated by two error evaluation criteria [8]: the Root Mean Square Error E RMSE
and Prediction Accuracy E PA : [ ] 1 2 2 RMSE 1 1 ( ) ( ) 1
t E P t O t N = ⎛ ⎞ = − ⎜ ⎟ − ⎝ ⎠ ∑ (20)
[ ] 1 PA ( ( )
)( ( ) ) ( 1) N m m t P O P t P O t O E N σ σ
= − − = − ∑ (21)
where O(t) is the target value, P(t) is the predicted value, O m is the mean value of O(t), σ O is the standard deviation of y(t), P m and
σ P are the mean value and standard deviation of P(t), respectively. E RMSE reflects the absolute deviation between the pre- dicted value and the observed value while E PA denotes the correlation coefficient between the observed and predicted value. In ideal situation, if there are no errors in prediction, these parameters will be E RMSE =0 and E PA =1.
4.1 Prediction of Lorenz Time Series The first data is derived from the Lorenz system, given by three differential equations: ( )
( ) ( )
d d ( )
( ) ( )
( ) ( ) d d ( ) ( ) ( ) ( ) ( )
d x t a x t y t t y t bx t y t x t z t t z t x t y t c t z t t ⎧ = − + ⎪ ⎪ ⎪ = − − ⎨ ⎪ ⎪ = − ⎪⎩
(22) where the typical values for the coefficients are a=10, b=8/3, c=28 and the initial values are x(0)=12, y(0)=2, z(0)=9. 1500 points of x(t), y(t) and z(t) obtained by four- order Runge-Kutta method are used as the training sample and 500 points as the test- ing sample. In order to extract the dynamics of this system to predict x(t+1), the parameters for phase-space reconstruction are chosen as τ x =
y =
z =3, m x =m y =m z =9. Thus a MLP neu- ral network with 27 input nodes, one hidden layer of 20 neurons and one output node are considered and a back propagation training algorithm is used. After the training process of the MLP neural network is topped, sensitivity analysis is carried out to evaluate the contribution of each input variable to the error function of the neural network. The trajectories of the sensitivity through training for each input are shown in Fig.2. It can be seen that the sensitivity undulates through training and finally converges when the weights and error are steady. The normalized sensitiv- ity measures in Eq.(17) are calculated. A threshold 0 0.98
η = is chosen to determine which inputs are discarded. Thus the input dimension of neural network is reduced to 11. The original input matrix is replaced by the reduced input matrix and the structure of the neural networks is simplified. The prediction performance over the testing samples with the reduced inputs is shown in Fig.4. 422 M. Han and R. Wei 0 2000
4000 6000
8000 10000
0 1 2 3 4 5 6 7 8 Epoch 0 5 10 15 20 25 30 0 0.1 0.2
0.3 0.4
0.5 0.6
0.7 Input Nodes
through training Fig. 3. The normalized sensitivity for each input node
-20
-10 0 10 20 0 100 200 300
400 500
-1 -0.5
0 0.5
1 Time
x( t) Er ro r Ovserved
Predicted
The solid line in Fig. 4 represents the observed values while the dashed line repre- sents the predicted values. It can be seen from Fig.4 that the chaotic behaviors of x(t) time series are well predicted and the errors between the observed values and the predicted values are small. The prediction performance are calculated in Table 1 and compared with the PCA variable reduction method.
With All Variables PCA Selection NN Selection Input Nodes 27
11 11
E RMSE
0.1278 0.1979 0.0630
E PA 0.9998 0.9997 1.0000 The prediction performance in Table 1 are comparable for the variable selection method with neural networks and the PCA method while the algorithm proposed in this paper obtains the best prediction accuracy. Variable Selection for Multivariate Time Series Prediction with Neural Networks 423
Rainfall is an important variable in hydrological systems. The chaotic characteristic of the rainfall time series has been proven in many papers [9]. In this section, the simula- tion is taken on the monthly rainfall time series in the city of Dalian, China over a period of 660 months (from 1951 to 2005). The performance of the rainfall may be influenced by many factors, so in this paper five other time series such as the tem- perature time series, air-pressure time series, humidity time series, wind-speed time series and sunlight time series are also considered. This method also follows the Taken’s theorem to reconstruct the embedding phase space first with the dimension and delay-time as m 1 =m 2 =m 3 =m 4 =m 5 =m 6 =9,
τ 1 = τ 2 = τ 3 = τ 4 = τ 5 = τ 6 =3. Then the input of the neural network contains L=660 - (9 - 1 )× 3=636 data points. In the experiments, this data set is divided into a training set composed of the first 436 points and a testing set containing the remaining 200 points. The neural network used in this paper then contains 54 input nodes, 20 hidden notes and 1 output. The threshold is also chosen as 0 0.98
η = . The trajectory of the input sensitivity and the normalized sensitivity for ever inputs are shown in Fig.5 and Fig.6, respectively. Then 34 input nodes are remained according to the sensitivity value.
2000 4000
6000 8000
10000 0 1 2 3 4 5 6 Epoch 0 10 20 30 40 50 60 0 0.04 0.08
0.12 0.16
0.2 Input Nodes No rm
iz ed S en si ti vi ty
Fig. 5. The trajectories of the input sensitiv- ity through training Fig. 6. The normalized sensitivity for each input node
The observed and predicted values of rainfall time series are shown in Fig.7, which gives high prediction accuracy. It can be seen from the figures that the chaotic behaviors of the rainfall time series are well predicted and the errors between the observed values and the predicted values are small. Corresponding values of E RMSE and E PA are shown in Table 2. Both of the figures and the error evaluation criteria indicate that the result for mul- tivariate chaotic time series using the neural network based variable selection is much better than the results with all variables and PCA method. It can be concluded from the two simulations that the variable selection algorithm using neural networks is able to capture the dynamics of both computer-generated and practical time series accurately and gives high prediction accuracy.
424 M. Han and R. Wei 0 100
200 300
400 0 40 80 120
160 200
-200 -100
0 100
200 t (month) er ror(m m ) Observed Predicted ra in fa ll (m m )
Fig. 7. The observed and predicted values of rainfall time series
With All Variables PCA Selection NN Selection Input Nodes 54
43 31
E RMSE
22.2189 21.0756 18.1435
E PA 0.9217 0.9286 0.9529 5 Conclusions This paper studies the variable selection algorithm using the sensitivity for pruning input nodes in a neural network model. A simple and effective criterion for identifying input nodes to be removed is also derived which does not require high computational cost and proves to work well in practice. The validity of the method was examined through a multivariate prediction problem and a comparison study was made with other variable selection methods. Experimental results encourage the application of the proposed method to complex tasks that need to identify significant input variables. Acknowledgements. This research is supported by the project (60674073) of the National Nature Science Foundation of China, the project (2006CB403405) of the National Basic Research Program of China (973 Program) and the project (2006BAB14B05) of the National Key Technology R&D Program of China. All of these supports are appreciated. References [1] Verikas, B.M.: Feature selection with neural networks. Pattern Recognition Letters 23, 1323–1335 (2002) [2] Castellano, G., Fanelli, A.M.: Variable selection using neural network models. Neural- computing 31, 1–13 (2000)
Variable Selection for Multivariate Time Series Prediction with Neural Networks 425 [3] Castellano, G., Fanelli, A.M., Pelillo, M.: An iterative method for pruning feed-forward neural networks. IEEE Trans. Neural Networks 8(3), 519–531 (1997) [4] Mozer, M.C., Smolensky, P.: Skeletonization: a technique for trimming the fat from a net- work via a relevance assessment. NIPS 1, 107–115 (1989) [5] Gevrey, M., Dimopoulos, I., Lek, S.: Review and comparison of methods to study the con- tribution of variables in artificial neural network models. Ecol. Model. 160, 249–264 (2003)
[6] Cao, L.Y., Mees, A., Judd, K.: Dynamics from multivariate time series. Physica D 121, 75–88 (1998) [7] Han, M., Fan, M., Xi, J.: Study of Nonlinear Multivariate Time Series Prediction Based on Neural Networks. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3497, pp. 618–623. Springer, Heidelberg (2005) [8] Chen, J.L., Islam, S., Biswas, P.: Nonlinear dynamics of hourly ozone concentrations: nonparametric short term prediction. Atmospheric environment 32(11), 1839–1848 (1998) [9] Liu, D.L., Scott, B.J.: Estimation of solar radiation in Australia from rainfall and tempera- ture observations. Agricultural and Forest Meteorology 106(1), 41–59 (2001)
Ordering Process of Self-Organizing Maps Improved by Asymmetric Neighborhood Function Takaaki Aoki 1 , Kaiichiro Ota 2 , Koji Kurata 3 , and Toshio Aoyagi 1 ,2 1 CREST, JST, Kyoto 606-8501, Japan 2 Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan 3 Faculty of Engineering, University of the Ryukyus, Okinawa 903-0213, Japan aoki@acs.i.kyoto-u.ac.jp Abstract. The Self-Organizing Map (SOM) is an unsupervised learning method based on the neural computation, which has recently found wide applications. However, the learning process sometime takes multi-stable states, within which the map is trapped to a undesirable disordered state including topological defects on the map. These topological defects crit- ically aggravate the performance of the SOM. In order to overcome this problem, we propose to introduce an asymmetric neighborhood function for the SOM algorithm. Compared with the conventional symmetric one, the asymmetric neighborhood function accelerates the ordering process even in the presence of the defect. However, this asymmetry tends to gen- erate a distorted map. This can be suppressed by an improved method of the asymmetric neighborhood function. In the case of one-dimensional SOM, it found that the required steps for perfect ordering is numerically shown to be reduced from O(N
3 ) to
O(N 2 ). Keywords: Self-Organizing Map, Asymmetric Neighborhood Function, Fast ordering. 1 Introduction The Self-Organizing Map (SOM) is an unsupervised learning method of a type of nonlinear principal component analysis [1]. Historically, it was proposed as a simplified neural network model having some essential properties to reproduce topographic representations observed in the brain [2,3,4,5]. The SOM algorithm can be used to construct an ordered mapping from input stimulus data onto two-dimensional array of neurons according to the topological relationships be- tween various characters of the stimulus. This implies that the SOM algorithm is capable of extracting the essential information from complicated data. From the viewpoint of applied information processing, the SOM algorithm can be regarded as a generalized, nonlinear type of principal component analysis and has proven valuable in the fields of visualization, compression and data mining. With based on the biological simple learning rule, this algorithm behaves as an unsupervised M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 426–435, 2008. c Springer-Verlag Berlin Heidelberg 2008 Ordering Process of SOMs Improved by Asymmetric Neighborhood Function 427
0 0.5
1 0 0.5 1 A 0 0.5 1 Reference Vector m i Unit Number i B 0 N Fig. 1. A: An example of a topological defect in a two-dimensional array of SOM with a uniform rectangular input space. The triangle point indicates the conflicting point in the feature map. B: Another example of topological defect in a one-dimensional array with scalar input data. The triangle points also indicate the conflicting points. learning method and provides a robust performance without a delicate tuning of learning conditions. However, there is a serious problem of multi-stability or meta-stability in the learning process [6,7,8]. When the learning process is trapped to these states, the map seems to be converged to the final state practically. However, some of theses states are undesirable for the solution of the learning procedure, in which typically the map has topological defects as shown in Fig. 1A. The map in Fig. 1A, is twisted with a topological defect at the center. In this situation, two-dimensional array of SOM should be arranged in the square space, for the input data taken uniformly from square space. But, this topological defect is a global conflicting point which is difficult to remove by local modulations of the reference vectors of units. Therefore, it will require a sheer number of learning steps to rectify the topological defect. Thus, the existence of the topological defect critically aggravates the performance of the SOM algorithm. To avoid the emergence of the topological defect, several conventional and empirical methods have been used. However, it is more favorable that the SOM algorithm works well without tuning any model parameters, even when the topo- logical defect emerged. Thus, let us consider a simple method which enables the effective ordering procedure of SOM in the presence of the topological defect. Therefore we propose an asymmetric neighborhood function which effectively removes the topological defect [9]. In the process of removing the topological de- fect, the conflicting point must be moved out toward the boundary of the arrays and vanished. Therefore, the motive process of the defect is essential for the ef- ficiency of the ordering process. With the original symmetric neighborhood, the movement of the defect is similar to a random walk stochastic process, whose effi- ciency is worse. By introducing the asymmetry of the neighborhood function, the movement behaves like a drift, which enables the faster ordering. For this reason, in this paper we investigate the effect of an asymmetric neighborhood function on the performance of the SOM algorithm for the case of one-dimensional and two-dimensional SOMs. 428 T. Aoki et al. 2 Methods
2.1 SOM
The SOM constructs a mapping from the input data space to the array of nodes, we call the ‘feature map’. To each node i, a parametric ‘reference vector’ m i is
ing to the following iterative procedure. An input vector x(t) is presented at each time step t, and the best matching unit whose reference vector is closest to the given input vector x(t) is chosen. The best matching unit c, called the ‘winner’ is given by c = arg min i x(t) − m
i . In other words, the data x(t) in the input data space is mapped on to the node c associated with the reference vector m
i closest to x(t). In SOM learning, the update rule for reference vectors is given by m i (t + 1) = m i (t) + α · h(r ic )[x(t) − m i (t)],
r ic ≡ r ic ≡ r
i − r
c (1)
where α, the learning rate, is some small constant. The function h(r) is called the ‘neighborhood function’, in which r ic is the distance from the position r c of the
winner node c to the position r i of a node i on the array of units. A widely used neighborhood function is the Gaussian function defined by, h(r ic ) = exp − r 2 ic 2 σ 2 . We expect an ordered mapping after iterating the above procedure a sufficient number of times. 2.2
Asymmetric Neighborhood Function We now introduce a method to transform any given symmetric neighborhood function to an asymmetric one (Fig. 2A). Let us define an asymmetry parameter β (β ≥ 1), representing the degree of asymmetry and the unit vector k indicating the direction of asymmetry. If a unit i is located on the positive direction with k, then the component parallel to k of the distance from the winner to the unit is scaled by 1/β. If a unit i is located on the negative direction with k, the parallel component of the distance is scaled by β. Hence, the asymmetric function h β (r),
transformed from its symmetric counterpart h(r), is described by h β (r ic ) = 2 1 β + β −1 · h(˜r
ic ), ˜ r ic = ⎧ ⎪ ⎨ ⎪ ⎩ r β 2 + r ⊥ 2 (r ic · k ≥ 0)
βr 2 + r ⊥ 2 (r ic · k < 0)
, (2)
where ˜ r ic is the scaled distance from the winner. r is the projected component of r
ic , and r
⊥ are the remaining components perpendicular to k, respectively. In addition, in order to single out the effect of asymmetry, the overall area of the neighborhood function, ∞ −∞
In the special case of the asymmetry parameter β = 1, h β (r) is equal to the original symmetric function h(r). Figure 2B displays an example of asymmetric Gaussian neighborhood functions in the two-dimensional array of SOM. Ordering Process of SOMs Improved by Asymmetric Neighborhood Function 429
Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling