Lecture Notes in Computer Science

Model of Cue Extraction from Distractors

bet	25/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 21 22 23 24 25 26 27 28 ... 88

Model of Cue Extraction from Distractors

by Active Recall

Adam Ponzi

Laboratory for Dynamics of Emergent Intelligence, RIKEN BSI, Saitama, Japan

adam@brain.riken.jp

Abstract. Cues are informative signals animals must use to make decisions in

order to obtain rewards, usually after intervening temporal delays, typiﬁed in the

cue-action-reward task. In behavioural experiments the cue is often clearly distin-

guished from other stimuli, by a salience such as brightness for example, however

in the real world animals face the problem of recognizing real cues from among

other environmental distracting stimuli. Furthermore once the cue is recognized

it must cause the animal to make a certain action to obtain reward. Therefore the

animal faces a compound chicken-and-egg problem to obtain reward. First it must

recognize the cue and then it must learn that the cue must initiate a certain action.

But how can the animal recognize the cue before it has learned the action to ob-

tain reward, since in this initial learning stage the cue is only partially predictive

of reward? Here we present a simple neural network model of how animals ex-

tract cues from background distractor stimulus, all presented with equal salience,

based on successive testing of different stimulus-action allocations over several

trials. A stimulus is selected and gated into working memory to drive an action

and then reactivated at the end period to be reinforced if correct. If the stimulus

is not reinforced over several trials it is suppressed and a different stimulus is se-

lected. If the stimulus is a real cue but it drives the incorrect action, its cue-action

allocation is suppressed. This mechanism is enhanced by the property of cue mu-

tual exclusion in trials which also provides a simple model of bottom-up attention

and pop-out. The model is based on the cortical and hippocampal projections to

the dopamine system through the striatum including a model of salience gated

working memory and a reinforcement and punishment system based on dopamine

feedback balance. We illustrate the model by numerical simulations of a rat learn-

ing to navigate a T-maze and show how it deterministically discovers the correct

cue-action allocations.

1

Introduction

The Basal Ganglia are well known to be involved in reward learning mechanisms [1,2]

and cortico-basal ganglia loops are critical for the learning of rewarded cued procedures

and in cued working memory tasks [3]. Dopamine is thought to play the role of reward

prediction error [1,4], where the burst ﬁring of dopamine cells is increased by unex-

pected rewards and reduced if an expected reward is omitted. The striatum, which is

the main input structure of the Basal Ganglia, recieves a strong input from the midbrain

dopaminergic system and the prominent striatal projection neurons, the medium spiny

neurons, are also known to reﬂect reward expectation themselves [5,6]. Dopamine ﬁring

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 269–278, 2008.

c Springer-Verlag Berlin Heidelberg 2008

270

A. Ponzi

can also be triggered by novel stimuli that do not involve reward [7]. Rapid detection of

a cue change by striatal neurons has been studied by Pasupathy [3]. The basal-ganglia,

the prefrontal cortex and the midbrain dopamine nuclei projecting to them are also

strongly implicated in working memory [8]. Persistent neural activity in recurrent cir-

cuits plays a central role in the maintenance of information in working memory [9].

Some theories suggest a gating role for phasic dopamine release, such that dopamine

release is required for read-in to working memory [11,10,12]. Seemingly paradoxically

dopamine can suppress or enhance striatal activity [13], and can extend the duration of

enhanced activity.

In spatial working memory tasks some hippocampal neurons (‘splitter cells’ or

‘episodic cells’) ﬁre selectively dependent on the context of a speciﬁc recent response

or future goal [14,15,18]. For example, during performance of a spatial alternation task,

many hippocampal pyramidal neurons ﬁre selectively after right turn or left turn tri-

als, even though the rat is running in the same direction through the same location on

the stem of the maze on both types of trials. Ferbinteaneu et al. [17] describe retro-

spective and prospective coding in hippocampal neural assemblies in spatial working

memory tasks. The origin of journeys inﬂuenced ﬁring even when rats made detours

showing that recent memory modulated neuronal activity more than spatial trajectory.

Diminshed retrospective and prospective coding was observed in error trials suggest-

ing this signal was important for task performance. Mulder et al. [19] recorded hip-

pocampal output structures associated with the motor system (nucleus accumbens and

ventromedial caudate nucleus) in rats solving a plus-maze. They found a variety of

responses including neurons that ﬁred continuously from the moment the rat left one

location until it arrived at a goal site, or at an intermediated place, such as the maze

centre. They suggest their results support the view that the ventral striatum provides an

interface between limbic and motor systems, permitting contextual representations to

trigger movements and have an impact on action sequences in goal directed behaviour.

Barnes et al. [20] have also described task and expert neurons in dorsal striatum.

2

Model

This model is composed of two parts, the cue detection part and the action selection

and reinforcement part. The model is an extension of the model presented in [21,22]. In

those papers we describe a cue response task typical of primate studies which consists

of three stages, stage 1 is the cue presentation stage, stage 2, the action selection stage

and stage 3, the reward stage, where each stage is separated by delay periods. In the

work presented here, this model is re-motivated as a spatial working memory temporal

credit assignment model typical for rat maze tasks and described in Fig.1(a). In stage

1 a cue is presented together with some distractors. Here we suggest this corresponds

to the spatial view of the animal from the initial location. The spatial views from each

of the two initial locations have some parts in common and also some differences. For

example a visual cue may be visible from one location but not the other, or the same cue

may be visible, but in a different location with respect to the animal’s head direction,

or other background cues. Here we simplify these possibilities and represent the view

simply as an activation of several units of the input ‘P ’ layer. In initial position ‘A’

Model of Cue Extraction from Distractors by Active Recall

271

a ﬁxed set of background distractors is activated simultaneously to a cue. In initial

position ‘B’ the same ﬁxed set of background distractors is activated, but the cue from

location ‘A’ is not activated and instead a different unit is activated. In stage 2 an action

must be selected by the animal from some possible alternatives. The correct action to

choose depends on the cue presented in stage 1, i.e. the initial location. Here the animal

must turn right at the junction if it started in one location and left if it started in the

other location. In stage 3 reward is given to the animal if the action made in stage 2 is

the correct one. The basic idea of the model is that the cue presented in a layer P in

stage 1 is gated into working memory in a recurrent layer ‘Q’, see Fig.1(b). P refers to

primary since this layer simply reﬂects the external environment, while Q is the layer

after P . The working memory of the cue is reactivated in layer Q during the action

selection stage 2. The reactivation is itself driven by an external trigger signal which is

also presented in layer P and given by the external environment, see Fig.1(a). During

this stage 2 the reactivated cue drives a winner-take-all (WTA) action selection system

in layer ‘M ’ which activates the action which is driven strongest by the cue. Layer Q is

connected to layer M in an all-to-all fashion so the action j activated in layer M when

the cell i is activated in layer Q depends on the synaptic weights J

between layers

and M . The action j selected is given by the strongest weight max

i=cue,j

As explained in [21,22] the cue in working memory in layer Q is again reactivated in

stage 3 and reward is given simultaneously if the action made in stage 2 is correct. The

reactivation is itself driven by another end signal in the P layer, see Fig.1(a). Since the

second reactivation activates the same Q layer cell i as stage 2, this also reactivates the

Fig. 1. (a) Task structure described in the text. The animal runs around the track and the task is

simpliﬁed to three stages shown as solid boxes on the track. The cue presentation stage 1 is at

the initial location, the action selection stage 2 (T

junction

) is when the junction is sensed and

the end stage 3 (T

end

) is when the end of the track is sensed. Reactivations of stimuli presented

in stage 1 occur in stages 2 and 3 together with activation of the action selecting M (MSN)

cells (G(t) = 1) in the latter halves of these stages, depicted by hatched boxes. Here the task

is depicted as an alternation task, but in this paper we study the more difﬁcult task where the

animal is removed from the end location and randomly placed at one of the two initial locations

each trial. (b) Anatomy of the model system described in the text. The primary P and recurrent

layers are suggested to be part of cortex or hippocampus which projects to striatum, while the

layer where the cells coding for actions are located may be striatum medium spiny neurons

(MSN). The N layer cell is not shown but is also suggested to be striatum projecting to thalamus.

272

A. Ponzi

same action j. If reward is present the synapse J

is strengthened, while if reward

is absent the same synapse is weakened. This process therefore targets a particular

synapse for reinforcement or punishment. Regardless of whether reward is received or

not, after a longer delay the next trial starts with one of the cues chosen at random

presented together with some distractors. In [21,22] we show that the system can easily

ﬁnd the correct cue to action allocation by an exploratory process using the punishment

signal to depress selected but unwanted actions. When the correct cue-action allocation

is discovered it is stabilized by dopamine negative feedback which limits the growth in

the weights J

A drawback of the model presented in [21,22] was that the distractors and cue pre-

sented in stage 1 had to be artiﬁcially distinguished by an incentive salience level which

was set to be high for the cue and low for the distractors, which were presented simul-

taneously. In fact the working memory was a winner-take-all system so that the unit

activated with maximum salience during stage 1 was the unit gated into working mem-

ory to be reactivated in stages 2 and 3. Only if the cue had sufﬁciently high relative

salience was it gated into working memory. If its salience was too low, a distractor

could be gated in instead. Here we address this defect and therefore consider the prob-

lem of cue extraction from among distractors, where all cues are presented externally

with the same salience. I.e. we allow the system to discover the cue and give it a high

salience, while the distractors are given a low salience.

The cue extraction model described here is comprised of the P and Q layers shown in

Fig.1(b) together with the thalamus T and the dopamine system. The Q layer activities

i

(t)

are given by,

−k

+ k

⎛

⎝

j=i

⎞

⎠f(T (t) − T

) + k

f (T

− T (t)).

(1)

In this equation the q

(t)

can be considered ﬁring rates or membrane potentials. The

ﬁrst term on the RHS represents the exponential decay of q

(t)

back to zero with rate

when there is no activation. The second term models the effect of the all-to-all mod-

ifyable recurrent collaterals with weights w

(t)

, where g(x) is the sigmoidal function

which provides the non-linearity and limits the activation of this term. The weights

ij

(t)

are given by,

= k

− k

(2)

The combination of Eq.1 and Eq.2 describes a winner-take-all system as described in

more detail in [21,22]. The k

term is an exponential decay which reverts the w

(t)

zero between trials.

The third term in Eq.1 is the one-to-one input from primary P layer. In this third term

(t) = 1, 0

are the activities of P layer cells which respond directly to the external

environment and are unity when active and zero at other times. As described above

during the cue presentation period one of the two cue cells is set to unity while the

other is set to zero, e.g. P

cueA

= 1

, P

cueB

= 0

and all the distractors are all set to unity.

Here we have ﬁve distractors and two cues so that the total number of P cells and Q

cells is N

= N

= 7

. The factor x

(t)

in the third term of Eq.1 is the modifyable

Model of Cue Extraction from Distractors by Active Recall

273

salience of input P

. The input with the maximum salience from among those presented

is the input gated into working memory. We wish x

to be high for the cues and low for

the distractors, and it is this variable which was artiﬁcially and externally ﬁxed in the

previous modeling [22], we will describe it below.

The factors f (T (t)

− T

)

and f (T

− T (t)) in the second and third terms of Eq.1

model the reactivation and P layer input down-gating respectively. T (t) is considered

to be the activity of the thalamus while T

is its baseline activity and f (x) = x when

x > 0

and f (x) = 0 otherwise is a positivity function. When the thalamus activity is

above baseline T (t) > T

the recurrent collateral term is activated which causes the

reactivation of the cue presented in stage 1. At these times the input from the P layer

is down-gated so that any input active at reactivation times does not interfere with the

reactivation of the previously presented cue. On the other hand when T (t) <= T

there is no reactivation and the activity of the Q layer cells is driven and determined by

their topographic inputs P

. The thalamus activity is given by,

−T (t) + T

+ T

junction

+ T

end

(3)

This describes activation of the thalamus only when the animal is at the junction stage 2

junction

= 1

) and the end stage 3 (T

end

= 1

), see Fig.1(b), where reward is located.

In fact we suggest the animal has already learned to make these reactivations and they

are represented in striatal cells projecting to thalamus. In the discussion section we

describe how the T

end

signal can be learned by the animal.

The important variables are the saliences x

. Here we suggest that the x

can be

appropriately modeled by,

= τ

(D(t)

− D

(4)

Here τ

is a slow timescale generating learning over many trials and the term (D(t)

−

)

is the excess dopamine level D(t) over its baseline D

, and it is given by,

−D(t) + D

− k

(t) + k

R(t) + otherterms

(5)

This equation includes an inhibitory term as the sum over the Q layer activities and an

excitatory term R(t) with factor k

which describes the primary reward, activated in

stage 3 during the end period where T

end

= 1

, when the animal makes a correct action

in stage 2. It is easy to understand how the pair of equations Eq.4 and Eq.5 produce

the desired behaviour. Suppose the cue has a higher salience x

cue

than the distractors

it is presented together with. Then it will drive the corresponding q

cue

during the cue

presentation period stage 1 more than the other q

are driven and will therefore be

gated into working memory and reactivated at the action selection stage 2 and the end

stage 3. Suppose this cue also drives the correct action, then dopamine will be positively

activated in stage 3 by the R(t) term during the second reactivation of the q

cue

. Eq.4

will be positive for the reactivated q

cue

and that particular x

cue

will be increased by

LTP. Therefore next trial it occurs the cue q

cue

will be even more likely to be gated

into working memory. Indeed the salience x

cue

for this cue has a positive feedback

274

A. Ponzi

and would grow without bound except for the fact that during the cue presentation

period LTD is generated by the inhibition of dopamine by the term

−k

7

i

(t)

Eq.5, which depends on the magnitude x

cue

and will therefore limit the growth of the

salience x

cue

to a ﬁxed value. Therefore we see that there is a stable attractor state

for the cue. However the situation is different for the distractors. Suppose one of the

distractors has the maximum salience x

dist1

from among the presented stimuli. This

dist1

will be gated into working memory and drive a given action in stage 2 and be

reactivated at the end period 3. However since this is not a cue there is only a ﬁfty-ﬁfty

chance that it is driving the correct action for the cue it happens to be presented with and

primary reward will therefore only be found half the time. During trials where primary

reward is not found the reactivation at the end period 3 will actually inhibit dopamine

Eq.5 and cause a suppression of x

dist1

, Eq.4, during the reactivation. x

dist1

will wander

around depending on chance sequences of trials without a stable ﬁxed value, but with

a long time average value well below what can be attained for x

cue

. After some time

dist1

will drop down sufﬁciently for another distractor, x

dist2

to become maximal

and be gated into working memory and then tested over several trials, eventually being

suppressed. Once x

cue

has been selected however, if it drives the correct action, it will

quickly attain its maximum stable attractor value.

We now also describe the action selection system presented previously in [21,22].

The system is described by the M layer which takes an all-to-all projection from the Q

layer and forms a winner take all system,

−M

+ f

⎛

⎝

(t)q

− k

j=1

+ k

⎞

⎠ G(t)

(6)

The M units represent the actions and here we have two of them N

= 2

correspond-

ing to right and left turns. This equation describes a standard winner-take-all system.

The term G(t) is set to unity when actions are allowed to be selected and zero other-

wise, see Fig.1(a). We suggest it is represented in striatum in a similar way to T

junction

and T

end

in Eq.3 and the reason it is different from T

junction

and T

end

will be described

in more detail below. The actual action selected is given by the sign of F (t) ﬁxed by

integration of the M units over the junction period T

junction

= 1

dF (t)

= T

junction

(

i=1

(t)

−

i=N

(t))

− (1 − T

junction

)F (t)

(7)

As described the synaptic weights J

from the Q to the M layer are updated by three

way Hebbian learning [23,24] which is reinforced in the presence of dopamine and

depressed without dopamine,

−J

(t) + f ((D(t)

− D

+ J

(t)).

(8)

The dopamine system Eq.5 is extended to include a negative feedback projection from

the M cells,

−k

(t)

. As described in [21,22] this system can ﬁnd and stabilize

Model of Cue Extraction from Distractors by Active Recall

275

the correct cue-action allocation providing the cues are known. Together with the cue

extraction model described above, the cue is successfully extracted and bound to the

correct action. We now illustrate this model by numerical simulations integrated by

fourth order Runge-Kutta.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 21 22 23 24 25 26 27 28 ... 88