Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
hippocampal areas CA3 and CA1. Science 305, 1295–1298 (2004) 10. Breese, C.R., Hampson, R.E., Deadwyler, S.A.: Hippocampal place cells: stereotypy and plasticity. J. Neurosci. 9, 1097–1111 (1989) 11. Fyhn, M., Molden, S., Hollup, S., Moser, M.B., Moser, E.: Hippocampal neurons responding to first-time dislocation of a target object. Neuron 35, 555–566 (2002) 12. Kobayashi, T., Nishijo, H., Fukuda, M., Bures, J., Ono, T.: J.Neurophysiol. 78, 597–613 (1997) 13. Markus, E.J., Qin, Y.L., Leonard, B., Skaggs, W.E., McNaughton, B.L., Barnes, C.A.: In- teractions between location and task affect the spatial and directional firing of hippocampal neurons. J.Neurosci. 15, 7079–7094 (1995) 14. Bower, M.R., Euston, D.R., McNaughton, B.L.: Sequential- context-dependent hippocampal activity is not necessary to learn sequences with repeated elements. J. Neurosci. 25, 1313– 1323 (2005) 15. Ferbinteanu, J., Shapiro, M.L.: Prospective and retrospective memory coding in the hip- pocampus. Neuron 40(2003), 1227–1239 (2003) 16. Wood, E.R., Dudchenko, P.A., Robitsek, R.J., Eichenbaum, H.: Hippocampal neurons en- code information about different types of memory episodes occurring in the same location. Neuron 27, 623–633 (2000) 17. Freund, T., Buzsaki, G.: Interneurons of the Hippocampus. Hippocampus 6, 347–470 (1996) 18. Sik, A., Minen, A., Penttonen, M., Buzsaki, G.: Inhibitory CA1-CA3- hilar region feedback in the hippocampus. Science 265, 1722–1724 (1994) 19. Somoyogi, P., Klausberger, T.: Defined types of cortical interneurone structure space and time in the Hippocampus. J.Physiol. 562.1, 9–26 (2005) 20. Foster, D.J., Wilson, M.A.: Hippocampal theta sequences Hippocampus (2007), DOI 10.1002/hipo.20345 21. Lisman, J.E.: Relating Hippocampal Circuitry to Function: Recall of Memory Sequences by Reciprocal Dentate-CA3 Interactions. Neuron 22, 233–242 (1999) 22. Johnston, D., Amaral, D.G.: In: Shepard, G.M. (ed.) The Synaptic Organization of the Brain, Oxford University Press, Oxford (1998) 23. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., Moser, E.I.: Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801 (2005) 24. Freund, T., Antal, M.: GABA-containing neurons in the septum control inhibitory interneu- rons in the hippocampus. Nature 336, 170–173 (1998) 25. Ponzi, A.: Dynamical System Model of Spatial Reward Learning. IEICE Technical Re- port 103, 163, 19–24 (2006) 26. Ponzi, A.: Model of Balance of Excitation and Inhibition in Hippocampal Sharp Wave Re- plays and Application to Spatial Remapping. In: Proceedings of IJCNN 2007 (2007) 27. Ponzi, A.: Simple Model of Hippocampal Splitter Cells. Japan Neural Network Society (JNNS) abstract (2006)
A New Constructive Algorithm for Designing and Training Artificial Neural Networks Md. Abdus Sattar 1 , Md. Monirul Islam 1 ,2,
, and Kazuyuki Murase 2 ,3 1 Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh 2 Department of Human and Artificial Intelligence Systems, Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan monirul@synapse.his.fukui-u.ac.jp 3 Research and Education Program for Life Science, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan Abstract. This paper presents a new constructive algorithm, called problem dependent constructive algorithm (PDCA), for designing and training artificial neural networks (ANNs). Unlike most previous stud- ies, PDCA puts emphasis on architectural adaptation as well as function level adaptation. The architectural adaptation is done by determining automatically the number of hidden layers in an ANN and of neurons in hidden layers. The function level adaptation, is done by training each hidden neuron with a different training set. PDCA uses a constructive approach to achieve both the architectural as well as function level adap- tation. It has been tested on a number of benchmark classification prob- lems in machine learning and ANNs. The experimental results show that PDCA can produce ANNs with good generalization ability in comparison with other algorithms. Keywords: Artificial neural networks (ANNs), architectural adapta- tion, function level adaptation, constructive approach and generalization ability. 1 Introduction Artificial neural networks (ANNs) have been widely used in many application areas. Many issues and problems, such as selection of training data, training algorithm and architectures, have to be addressed and resolved when using ANNs [8]. Among them the proper selection of an ANN architecture is of great interest because the performance of the ANN is greatly dependent on its architecture. There have been many attempts in designing and training ANNs, such as various constructive, pruning and evolutionary approaches (see review papers [4], [12] and [20]). The main problem of most existing approaches is that they can design either single hidden layered ANNs or multiple hidden layers ANNs with one neuron in each hidden layer [1], [5]-[7]. It, however, is quite difficult to decide Corresponding author. M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 317–327, 2008. c Springer-Verlag Berlin Heidelberg 2008
318 M.A. Sattar, M.M. Islam, and K. Murase in advance whether a problem can be solved efficiently by using single hidden layered or multiple hidden layered ANNs. It is therefore necessary to devise an algorithm that is able to design both single and multiple hidden layered ANNs depending on problems complexity. This paper proposes a new constructive algorithm, called problem dependent constructive algorithm (PDCA), for designing and training feedforward ANNs. PDCA determines automatically not only the number of hidden layers in an ANN, but also the number of neurons in hidden layers. It uses a constructive approach with a layer stopping criterion for determining them. PDCA’s emphasis on training different hidden neurons with different training sets can increase the efficiency of determining ANN’s architecture automatically. PDCA differs from previous works for designing and training ANNs on a num- ber of aspects. First, it is an algorithm that can design both single and multiple hidden layered ANNs depending on the complexity of a given problem. This ap- proach is quite different from most existing algorithms (e.g.,[1] and [17]), which try to solve problems either by using only single or multiple hidden layers but not the both. Although single hidden layered ANNs are universal aproximators [3], multiple hidden layered ANNs are superior over single hidden layered ANNs for some problems [18]. Second, all the existing algorithms train hidden neurons in an ANN by the same training set. But PDCA creates a new training set based on the performance of the existing ANN architecture when a new neuron is added. Although this approach is used by boosting algorithm [15] for designing ANN ensembles, it is the first at- tempt to our best knowledge to use this concept in designing single ANNs. Third, most existing algorithms (e.g., [1], [5]-[7] and [9]) do not have any effective mechanism for stopping the addition of neurons in hidden layers. Con- sequently, they use only one neuron in each hidden layer resulting very deep architectures and long propagation delay, which is also not suitable for VLSI implementation. Or they use a predefined and fixed number of neurons for all hidden layers [10]. The problem of using a fixed number of neurons lies in the difficulty of selecting an appropriate number that is suitable for a given problem. To address these problems, PDCA uses a layer stopping criterion that determines automatically the number of neurons in each hidden layer. The rest of the paper is organized as follows. Section 2 describes PDCA in details. Section 3 presents results of our experimental study. Finally, Section 4 concludes the paper with a brief summary and few remarks. 2 PDCA
In order to determine automatically the number of hidden layers in ANNs and of neurons in hidden layers, PDCA uses incremental training in association with layer stopping criterion in designing ANNs. In our incremental training, hidden layers and hidden neurons are added to the ANN architecture one by one in a constructive fashion during training. The layer stopping criterion is used to decide when to add a new hidden layer by stopping the growth i.e., neurons addition to a hidden layer. To obtain efficient solution, PDCA trains each hidden
A New Constructive Algorithm for Designing and Training ANNs 319
neuron in an ANN with a different training set and stops automatically the ANN construction process. Though any kinds of ANNs and activation functions can be used for PDCA, for this work, we used PDCA to design feedforward ANNs with sigmoid activa- tion function. The feedforward ANNs considered here are generalized multilayer perceptrons. In such architecture, the first hidden layer receives only network in- puts (I) while other hidden layer(s) receives I plus the outputs of the preceding hidden layer(s). The output layer receives signals only from all hidden layers. The major steps of PDCA are summarized in Fig. 1, which are explained further as follows. Step 1. Create an initial ANN architecture consisting of three layers i.e., an input layer, a hidden layer and an output layer. The number of neurons in the input and output layers is the same as the number of inputs and outputs No Yes Add one hidden layer Add one neuron Create an initial ANN architecture Train the ANN Create a training set Stop layer construction ? No Yes
Stop final training ? Stop ANN construction ? Yes No
Yes Initial partial training Stop initial training ? Final partial training Stop ANN construction? Yes
Final ANN No Fig. 1. Flowchart of PDCA 320 M.A. Sattar, M.M. Islam, and K. Murase of a given problem, respectively. Initially, the hidden layer contains only one neuron. Randomly initialize the connection weights of the ANN within a certain range and label the hidden layer with I. Step 2. Create a new training set for the newly added hidden neuron based on the performance of the existing ANN architecture. PDCA uses adaboost.M2 algorithm [16], which is a variant of boosting algorithm [15], in creating training sets. It is here important to note that the original training set is used for training the initial architecture. Step 3. Partially train the ANN by backpropagation learning algorithm for a certain number of training epochs. This training phase is known as the initial training of the existing ANN architecture. The number of epochs, τ, is specified by the user. Partial training means that an ANN is trained for a fixed number of epochs regardless whether it has converged or not. Step 4. Check the termination criterion for stopping the ANN construction process. If the criterion is satisfied, go to the Step 12. Otherwise continue. Step 5. Compute the ANN error, i.e., E, on the training set. If E reduces by a threshold after the training epochs τ, go to the Step 3 for further training of the existing architecture. It is here assumed that the training process is progressing well and it is necessary to train the existing architecture further. Otherwise continue for the final training of the existing architecture. Step 6. Add a small amount of noise to input and output connection weights of a previously added neuron in the I-labeled hidden layer. Partially train the ANN by backpropagation learning algorithm for τ epochs. This training phase is known as the final training of the existing ANN architecture. Step 7. Check the termination criterion for stopping the ANN construction process. If the criterion is satisfied, go the Step 12. Otherwise continue. Step 8. Compute E on the training set. If E reduces by a threshold after the training epochs τ, go to the Step 6 for further training of the existing architecture. It is here assumed that the final training phase is progressing well and it is necessary to train the existing ANN further. Otherwise continue for modifying the existing architecture by adding hidden neurons or layers. Step 9. Check the criterion for stopping the growth of I-labeled hidden layer construction process. If the criterion is satisfied, stop the construction pro- cess of the I-labeled hidden layer by freezing the input and output con- nectivities of a previously added neuron in the hidden layer and continue. Otherwise go to the Step 11 for adding a neuron to the hidden layer. Freezing, which was first introduced in [1], means that the frozen connection weights will not be trained i.e., changed when the ANN will be trained in future. Step 10. Replace the labeling of the I-labeled hidden layer by label F . Add a new hidden layer above the existing hidden layer(s) of the ANN. Initially the new hidden layer contains one neuron and it is labeled with I. The
connection weights of the neuron is initialized in the same way as described in the Step 1 and go to the Step 2. Step 11. Add one neuron to the I-labeled hidden layer and freeze the input and output connectivities of a previously added neuron in this layer. The
A New Constructive Algorithm for Designing and Training ANNs 321
connection weights of the newly added neuron is initialized in the same way as described in the Step 1 and go to the Step 2. Step 12. The existing ANN architecture is the final architecture for a given problem.
As PDCA trains only one, i.e., the newly added neuron at a time, hence other nonlinear optimization methods, such as BFS or quasi-Newton [17] methods that are computationally expensive but have faster convergence, can easily be used in PDCA for training ANNs. Although the design of ANNs could be formulated as a multi objective optimization problem, PDCA uses a very simple cost function, the ANN error. The processes and criteria, incorporated in PDCA at different stages, are described briefly in the following subsections. 2.1
Termination Criterion PDCA uses a criterion based on both training and validation errors to decide when the training process of an ANN is to be stopped. To formally describe the criteria, let E va
τ) and E opt
( τ) are the validation error at training epoch τ and lowest validation error obtained in epochs up to τ, respectively. The generaliza- tion loss, GL, at epoch τ can be defined by the following equation [11]. GL(τ) = E
( τ) E opt ( τ) − 1 (1)
A high generalization loss can be one obvious reason to stop training, because it directly indicates overfitting. However, it is desirable not to stop the training process if the training error E tr is still progressing very rapidly. To formalize this notion, let a training strip of length k to be a sequence of k epochs numbered n + 1...n + k, where n is divisible by k. The training progress in a training strip ( P
) can be used to measure how much the average training error of the strip is larger than the minimum training error during the strip. It can be defined by the following equation [11]. P k ( τ) =
τ τ =τ−k+1
E tr ( τ ) k. min
τ τ =τ−k+1
E tr ( τ ) − 1
(2) PDCA terminates the training process when GL(τ)/P k
τ) > α, where α is a user specified positive number. The reason for using training and validation data in the termination condition is to anticipate the behavior of test data better. 2.2
Layer Stopping Criterion PDCA uses a simple criterion for deciding when to stop the growth of a I-labeled hidden layer. The criteria is based on the contribution of neurons in a hidden layer. The contribution, C k , of a neuron k at any training epoch is C k
1 E − 1 E k (3) 322 M.A. Sattar, M.M. Islam, and K. Murase Where E is the network error and E k is the network error excluding neuron k. The layer stopping criterion stops the growth of a I-labeled hidden layer when its contribution to an ANN, measured after the addition of each hidden neuron, failed to improve after the addition of a certain number of neurons, indicated by the parameter m h
when the following is true: C k ( m + m
h ) ≤ C k ( m), m = 1, 2, . . . , (4) where
m h ( m h > 0) is a user specified positive integer number. If m h = 0
then all hidden layers of an ANN can consist of one hidden neuron only like CCA [1]. In PDCA, each hidden layer can consist of several neurons because m h
to a hidden layer after its growth process has been stopped. Furthermore, the neurons in a hidden layer whose growth process has been stopped will not be trained anymore. 2.3
Creation of New Training Sets Adaboost.M2 algorithm [16], which is proposed for training ANN ensembles, is used in PDCA to create different training sets for different hidden neurons in ANNs. It maintains a probability distribution D over the original training set T . Initially, D = 1/M where M is number of examples in T . The algorithm trains the first ANN in the ensemble by using the original training set T . After training the first ANN, D is updated in which the probability of incorrectly and correctly classified examples is increased and decreased, respectively. A new training set T is created based on updated D by sampling M examples at random with replacement from T . The second ANN of the ensemble is then trained by T . This process is repeated for other ANNs in the ensemble. The strategy used in adaboost for training ANNs in an ensemble can easily be incorporated in PDCA for training hidden neurons in an ANN. This is because PDCA trains hidden neurons in an ANN one by one, which is similar to train ANNs in an ensemble one after another by adaboost. In addition, PDCA trains only one, i.e., a newly added hidden neuron, at a time by freezing the input and output connectivities of a previously added neuron. This is also similar to training one ANN in the ensemble by adaboost algorithm. The use of different training sets at different stages in the training process of an ANN will facilitate to achieve functional adaptation in PDCA. 3 Experimental Studies This section evaluates the performance of PDCA on several benchmark classifi- cation problems. Table 1 shows the summary characteristics of the problems dis- playing considerable diversity in the number of examples, attributes and classes. The detail description of all these problems can be obtained from [13] except iris and letter which can be obtained from UCI Machine Learning Repository.
A New Constructive Algorithm for Designing and Training ANNs 323
Table 1. Characteristics of experimental data sets Data set
Number of input
output training validation testing attributes classes examples examples examples Cancer 9
350 175
174 Card
14 2 345 173 172
Diabetes 8 2 384 192
192 Glass
9 6 107 54 53 Gene 120 3 1588 794 793
Iris 9 2 107 54 53 Letter 16 26 10000 5000
5000 Thyroid
21 3 3600 1800 1800
A. Experimental Setup Table 1 shows the partitioning and the number of examples in each partition for the datasets of different problems. The name of the partitions is training set, validation set, and testing set. In all data sets, the first M examples were used for the training set, the following N examples for the validation set, and the final
P examples for the testing set. It should be kept in mind that such partitions do not represent the optimal ones in practice. While the training set and testing set are used to train and to evaluate the generalization ability of trained ANNs, respectively, the union of training and validation sets is used to determine whether to add hidden layers or neurons or to stop the training altogether. In all experiments, one bias neuron with a fixed input +1 is connected to all hidden layers and to the output layer. The logistic sigmoid function is used for Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling