Lecture Notes in Computer Science

bet	29/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 25 26 27 28 29 30 31 32 ... 88

hippocampal areas CA3 and CA1. Science 305, 1295–1298 (2004)

10. Breese, C.R., Hampson, R.E., Deadwyler, S.A.: Hippocampal place cells: stereotypy and

plasticity. J. Neurosci. 9, 1097–1111 (1989)

11. Fyhn, M., Molden, S., Hollup, S., Moser, M.B., Moser, E.: Hippocampal neurons responding

to ﬁrst-time dislocation of a target object. Neuron 35, 555–566 (2002)

12. Kobayashi, T., Nishijo, H., Fukuda, M., Bures, J., Ono, T.: J.Neurophysiol. 78, 597–613

(1997)

13. Markus, E.J., Qin, Y.L., Leonard, B., Skaggs, W.E., McNaughton, B.L., Barnes, C.A.: In-

teractions between location and task affect the spatial and directional ﬁring of hippocampal

neurons. J.Neurosci. 15, 7079–7094 (1995)

14. Bower, M.R., Euston, D.R., McNaughton, B.L.: Sequential- context-dependent hippocampal

activity is not necessary to learn sequences with repeated elements. J. Neurosci. 25, 1313–

1323 (2005)

15. Ferbinteanu, J., Shapiro, M.L.: Prospective and retrospective memory coding in the hip-

pocampus. Neuron 40(2003), 1227–1239 (2003)

16. Wood, E.R., Dudchenko, P.A., Robitsek, R.J., Eichenbaum, H.: Hippocampal neurons en-

code information about different types of memory episodes occurring in the same location.

Neuron 27, 623–633 (2000)

17. Freund, T., Buzsaki, G.: Interneurons of the Hippocampus. Hippocampus 6, 347–470 (1996)

18. Sik, A., Minen, A., Penttonen, M., Buzsaki, G.: Inhibitory CA1-CA3- hilar region feedback

in the hippocampus. Science 265, 1722–1724 (1994)

19. Somoyogi, P., Klausberger, T.: Deﬁned types of cortical interneurone structure space and

time in the Hippocampus. J.Physiol. 562.1, 9–26 (2005)

20. Foster, D.J., Wilson, M.A.: Hippocampal theta sequences Hippocampus (2007), DOI

10.1002/hipo.20345

21. Lisman, J.E.: Relating Hippocampal Circuitry to Function: Recall of Memory Sequences by

Reciprocal Dentate-CA3 Interactions. Neuron 22, 233–242 (1999)

22. Johnston, D., Amaral, D.G.: In: Shepard, G.M. (ed.) The Synaptic Organization of the Brain,

Oxford University Press, Oxford (1998)

23. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., Moser, E.I.: Microstructure of a spatial map

in the entorhinal cortex. Nature 436, 801 (2005)

24. Freund, T., Antal, M.: GABA-containing neurons in the septum control inhibitory interneu-

rons in the hippocampus. Nature 336, 170–173 (1998)

25. Ponzi, A.: Dynamical System Model of Spatial Reward Learning. IEICE Technical Re-

port 103, 163, 19–24 (2006)

26. Ponzi, A.: Model of Balance of Excitation and Inhibition in Hippocampal Sharp Wave Re-

plays and Application to Spatial Remapping. In: Proceedings of IJCNN 2007 (2007)

27. Ponzi, A.: Simple Model of Hippocampal Splitter Cells. Japan Neural Network Society

(JNNS) abstract (2006)

A New Constructive Algorithm for Designing

and Training Artiﬁcial Neural Networks

Md. Abdus Sattar

, Md. Monirul Islam

,2,

, and Kazuyuki Murase

Department of Computer Science and Engineering,

Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh

Department of Human and Artiﬁcial Intelligence Systems,

Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo,

Fukui 910-8507, Japan

monirul@synapse.his.fukui-u.ac.jp

Research and Education Program for Life Science, University of Fukui,

3-9-1 Bunkyo, Fukui 910-8507, Japan

Abstract. This paper presents a new constructive algorithm, called

problem dependent constructive algorithm (PDCA), for designing and

training artiﬁcial neural networks (ANNs). Unlike most previous stud-

ies, PDCA puts emphasis on architectural adaptation as well as function

level adaptation. The architectural adaptation is done by determining

automatically the number of hidden layers in an ANN and of neurons

in hidden layers. The function level adaptation, is done by training each

hidden neuron with a diﬀerent training set. PDCA uses a constructive

approach to achieve both the architectural as well as function level adap-

tation. It has been tested on a number of benchmark classiﬁcation prob-

lems in machine learning and ANNs. The experimental results show that

PDCA can produce ANNs with good generalization ability in comparison

with other algorithms.

Keywords: Artiﬁcial neural networks (ANNs), architectural adapta-

tion, function level adaptation, constructive approach and generalization

ability.

Introduction

Artiﬁcial neural networks (ANNs) have been widely used in many application

areas. Many issues and problems, such as selection of training data, training

algorithm and architectures, have to be addressed and resolved when using ANNs

[8]. Among them the proper selection of an ANN architecture is of great interest

because the performance of the ANN is greatly dependent on its architecture.

There have been many attempts in designing and training ANNs, such as

various constructive, pruning and evolutionary approaches (see review papers

[4], [12] and [20]). The main problem of most existing approaches is that they can

design either single hidden layered ANNs or multiple hidden layers ANNs with

one neuron in each hidden layer [1], [5]-[7]. It, however, is quite diﬃcult to decide

Corresponding author.

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 317–327, 2008.

c Springer-Verlag Berlin Heidelberg 2008

318

M.A. Sattar, M.M. Islam, and K. Murase

in advance whether a problem can be solved eﬃciently by using single hidden

layered or multiple hidden layered ANNs. It is therefore necessary to devise an

algorithm that is able to design both single and multiple hidden layered ANNs

depending on problems complexity.

This paper proposes a new constructive algorithm, called problem dependent

constructive algorithm (PDCA), for designing and training feedforward ANNs.

PDCA determines automatically not only the number of hidden layers in an

ANN, but also the number of neurons in hidden layers. It uses a constructive

approach with a layer stopping criterion for determining them. PDCA’s emphasis

on training diﬀerent hidden neurons with diﬀerent training sets can increase the

eﬃciency of determining ANN’s architecture automatically.

PDCA diﬀers from previous works for designing and training ANNs on a num-

ber of aspects. First, it is an algorithm that can design both single and multiple

hidden layered ANNs depending on the complexity of a given problem. This ap-

proach is quite diﬀerent from most existing algorithms (e.g.,[1] and [17]), which

try to solve problems either by using only single or multiple hidden layers but

not the both. Although single hidden layered ANNs are universal aproximators

[3], multiple hidden layered ANNs are superior over single hidden layered ANNs

for some problems [18].

Second, all the existing algorithms train hidden neurons in an ANN by the same

training set. But PDCA creates a new training set based on the performance of the

existing ANN architecture when a new neuron is added. Although this approach

is used by boosting algorithm [15] for designing ANN ensembles, it is the ﬁrst at-

tempt to our best knowledge to use this concept in designing single ANNs.

Third, most existing algorithms (e.g., [1], [5]-[7] and [9]) do not have any

eﬀective mechanism for stopping the addition of neurons in hidden layers. Con-

sequently, they use only one neuron in each hidden layer resulting very deep

architectures and long propagation delay, which is also not suitable for VLSI

implementation. Or they use a predeﬁned and ﬁxed number of neurons for all

hidden layers [10]. The problem of using a ﬁxed number of neurons lies in the

diﬃculty of selecting an appropriate number that is suitable for a given problem.

To address these problems, PDCA uses a layer stopping criterion that determines

automatically the number of neurons in each hidden layer.

The rest of the paper is organized as follows. Section 2 describes PDCA in

details. Section 3 presents results of our experimental study. Finally, Section 4

concludes the paper with a brief summary and few remarks.

PDCA

In order to determine automatically the number of hidden layers in ANNs and

of neurons in hidden layers, PDCA uses incremental training in association with

layer stopping criterion in designing ANNs. In our incremental training, hidden

layers and hidden neurons are added to the ANN architecture one by one in

a constructive fashion during training. The layer stopping criterion is used to

decide when to add a new hidden layer by stopping the growth i.e., neurons

addition to a hidden layer. To obtain eﬃcient solution, PDCA trains each hidden

A New Constructive Algorithm for Designing and Training ANNs

319

neuron in an ANN with a diﬀerent training set and stops automatically the ANN

construction process.

Though any kinds of ANNs and activation functions can be used for PDCA,

for this work, we used PDCA to design feedforward ANNs with sigmoid activa-

tion function. The feedforward ANNs considered here are generalized multilayer

perceptrons. In such architecture, the ﬁrst hidden layer receives only network in-

puts (I) while other hidden layer(s) receives I plus the outputs of the preceding

hidden layer(s). The output layer receives signals only from all hidden layers.

The major steps of PDCA are summarized in Fig. 1, which are explained

further as follows.

Step 1. Create an initial ANN architecture consisting of three layers i.e., an

input layer, a hidden layer and an output layer. The number of neurons in

the input and output layers is the same as the number of inputs and outputs

Yes

Add one hidden layer

Add one neuron

Create an initial ANN

architecture

Train the ANN

Create a training set

Stop layer

construction ?

Yes

Stop final

training ?

Stop ANN

construction ?

Yes

No

No

Yes

Initial partial training

Stop initial

training ?

Final partial training

Stop ANN

construction?

Yes

Final ANN

Fig. 1. Flowchart of PDCA

320

M.A. Sattar, M.M. Islam, and K. Murase

of a given problem, respectively. Initially, the hidden layer contains only one

neuron. Randomly initialize the connection weights of the ANN within a

certain range and label the hidden layer with

Step 2. Create a new training set for the newly added hidden neuron based on

the performance of the existing ANN architecture. PDCA uses adaboost.M2

algorithm [16], which is a variant of boosting algorithm [15], in creating

training sets. It is here important to note that the original training set is

used for training the initial architecture.

Step 3. Partially train the ANN by backpropagation learning algorithm for

a certain number of training epochs. This training phase is known as the

initial training of the existing ANN architecture. The number of epochs,

τ,

is speciﬁed by the user. Partial training means that an ANN is trained for a

ﬁxed number of epochs regardless whether it has converged or not.

Step 4. Check the termination criterion for stopping the ANN construction

process. If the criterion is satisﬁed, go to the Step 12. Otherwise continue.

Step 5. Compute the ANN error, i.e.,

E, on the training set. If E reduces by a

threshold

after the training epochs

τ, go to the Step 3 for further training

of the existing architecture. It is here assumed that the training process is

progressing well and it is necessary to train the existing architecture further.

Otherwise continue for the ﬁnal training of the existing architecture.

Step 6. Add a small amount of noise to input and output connection weights

of a previously added neuron in the

I-labeled hidden layer. Partially train

the ANN by backpropagation learning algorithm for

τ epochs. This training

phase is known as the ﬁnal training of the existing ANN architecture.

Step 7. Check the termination criterion for stopping the ANN construction

process. If the criterion is satisﬁed, go the Step 12. Otherwise continue.

Step 8. Compute

E on the training set. If E reduces by a threshold after

the training epochs

τ, go to the Step 6 for further training of the existing

architecture. It is here assumed that the ﬁnal training phase is progressing

well and it is necessary to train the existing ANN further. Otherwise continue

for modifying the existing architecture by adding hidden neurons or layers.

Step 9. Check the criterion for stopping the growth of

I-labeled hidden layer

construction process. If the criterion is satisﬁed, stop the construction pro-

cess of the

I-labeled hidden layer by freezing the input and output con-

nectivities of a previously added neuron in the hidden layer and continue.

Otherwise go to the Step 11 for adding a neuron to the hidden layer. Freezing,

which was ﬁrst introduced in [1], means that the frozen connection weights

will not be trained i.e., changed when the ANN will be trained in future.

Step 10. Replace the labeling of the

I-labeled hidden layer by label F . Add a

new hidden layer above the existing hidden layer(s) of the ANN. Initially

the new hidden layer contains one neuron and it is labeled with

I. The

connection weights of the neuron is initialized in the same way as described

in the Step 1 and go to the Step 2.

Step 11. Add one neuron to the

I-labeled hidden layer and freeze the input

and output connectivities of a previously added neuron in this layer. The

A New Constructive Algorithm for Designing and Training ANNs

321

connection weights of the newly added neuron is initialized in the same way

as described in the Step 1 and go to the Step 2.

Step 12. The existing ANN architecture is the ﬁnal architecture for a given

problem.

As PDCA trains only one, i.e., the newly added neuron at a time, hence other

nonlinear optimization methods, such as BFS or quasi-Newton [17] methods that

are computationally expensive but have faster convergence, can easily be used in

PDCA for training ANNs. Although the design of ANNs could be formulated as

a multi objective optimization problem, PDCA uses a very simple cost function,

the ANN error. The processes and criteria, incorporated in PDCA at diﬀerent

stages, are described brieﬂy in the following subsections.

2.1

Termination Criterion

PDCA uses a criterion based on both training and validation errors to decide

when the training process of an ANN is to be stopped. To formally describe the

criteria, let

va

(

τ) and E

opt

(

τ) are the validation error at training epoch τ and

lowest validation error obtained in epochs up to

τ, respectively. The generaliza-

tion loss,

GL, at epoch τ can be deﬁned by the following equation [11].

GL(τ) =

E

va

(

τ)

opt

(

τ)

− 1

(1)

A high generalization loss can be one obvious reason to stop training, because

it directly indicates overﬁtting. However, it is desirable not to stop the training

process if the training error

is still progressing very rapidly. To formalize this

notion, let a training strip of length

k to be a sequence of k epochs numbered

n + 1...n + k, where n is divisible by k. The training progress in a training strip

(

P

k

) can be used to measure how much the average training error of the strip

is larger than the minimum training error during the strip. It can be deﬁned by

the following equation [11].

(

τ) =

τ =τ−k+1

(

τ )

k. min

τ =τ−k+1

(

τ )

− 1

(2)

PDCA terminates the training process when

GL(τ)/P

k

(

τ) > α, where α is a

user speciﬁed positive number. The reason for using training and validation data

in the termination condition is to anticipate the behavior of test data better.

2.2

Layer Stopping Criterion

PDCA uses a simple criterion for deciding when to stop the growth of a

I-labeled

hidden layer. The criteria is based on the contribution of neurons in a hidden

layer. The contribution,

, of a neuron

k at any training epoch is

k

= 100

−

(3)

322

M.A. Sattar, M.M. Islam, and K. Murase

Where

E is the network error and E

is the network error excluding neuron

The layer stopping criterion stops the growth of a

I-labeled hidden layer when

its contribution to an ANN, measured after the addition of each hidden neuron,

failed to improve after the addition of a certain number of neurons, indicated

by the parameter

h

below. In other words, the growth of a hidden layer stops

when the following is true:

(

m + m

)

≤ C

(

m), m = 1, 2, . . . ,

(4)

where

(

> 0) is a user speciﬁed positive integer number. If m

= 0

then all hidden layers of an ANN can consist of one hidden neuron only like

CCA [1]. In PDCA, each hidden layer can consist of several neurons because

h

greater than zero is used. It is here worth mentioning that no neurons are added

to a hidden layer after its growth process has been stopped. Furthermore, the

neurons in a hidden layer whose growth process has been stopped will not be

trained anymore.

2.3

Creation of New Training Sets

Adaboost.M2 algorithm [16], which is proposed for training ANN ensembles, is

used in PDCA to create diﬀerent training sets for diﬀerent hidden neurons in

ANNs. It maintains a probability distribution

D over the original training set T .

Initially,

D = 1/M where M is number of examples in T . The algorithm trains

the ﬁrst ANN in the ensemble by using the original training set

T . After training

the ﬁrst ANN,

D is updated in which the probability of incorrectly and correctly

classiﬁed examples is increased and decreased, respectively. A new training set

T is created based on updated D by sampling M examples at random with

replacement from

T . The second ANN of the ensemble is then trained by T .

This process is repeated for other ANNs in the ensemble.

The strategy used in adaboost for training ANNs in an ensemble can easily be

incorporated in PDCA for training hidden neurons in an ANN. This is because

PDCA trains hidden neurons in an ANN one by one, which is similar to train

ANNs in an ensemble one after another by adaboost. In addition, PDCA trains

only one, i.e., a newly added hidden neuron, at a time by freezing the input

and output connectivities of a previously added neuron. This is also similar to

training one ANN in the ensemble by adaboost algorithm. The use of diﬀerent

training sets at diﬀerent stages in the training process of an ANN will facilitate

to achieve functional adaptation in PDCA.

Experimental Studies

This section evaluates the performance of PDCA on several benchmark classiﬁ-

cation problems. Table 1 shows the summary characteristics of the problems dis-

playing considerable diversity in the number of examples, attributes and classes.

The detail description of all these problems can be obtained from [13] except iris

and letter which can be obtained from UCI Machine Learning Repository.

A New Constructive Algorithm for Designing and Training ANNs

323

Table 1. Characteristics of experimental data sets

Data set

Number of

input

output training validation testing

attributes classes examples examples examples

Cancer

9

2

350

175

174

Card

345

173

172

Diabetes

384

192

Glass

107

Gene

120

1588

794

793

Iris

107

Letter

10000

5000

Thyroid

3600

1800

A. Experimental Setup

Table 1 shows the partitioning and the number of examples in each partition

for the datasets of diﬀerent problems. The name of the partitions is training

set, validation set, and testing set. In all data sets, the ﬁrst

M examples were

used for the training set, the following

N examples for the validation set, and

the ﬁnal

P examples for the testing set. It should be kept in mind that such

partitions do not represent the optimal ones in practice. While the training set

and testing set are used to train and to evaluate the generalization ability of

trained ANNs, respectively, the union of training and validation sets is used

to determine whether to add hidden layers or neurons or to stop the training

altogether.

In all experiments, one bias neuron with a ﬁxed input +1 is connected to all

hidden layers and to the output layer. The logistic sigmoid function is used for

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 25 26 27 28 29 30 31 32 ... 88