Lecture Notes in Computer Science

bet	57/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 53 54 55 56 57 58 59 60 ... 88

5 Conclusion

Fig. 5. Subject distribution of estimated meta-parameters, larning rate,

, action stochasticity

(inverse temperature),

, and discount rate,

γ.

Left panel: distribution on

α−γ

space. Subject LI,

NN, and NT (Left panel, indicated by inside of ellipsoid), were trapped to local optimal action

sequence. Right panel: distribution on

α−β

space. Subject BB and LI (right panel, indicated by

inside of ellipsoid) were reported that they could not find any confident strategy.

Estimating Internal Variables of a Decision Maker’s Brain

603

5 Conclusion

Theoretical framework of reinforcement learning to model behavioral decision

making and the Bayesian estimating method for subjective internal variable can be

powerful tools for analyzing both neural recording (Samejima et al., 2005) and human

imaging data (Daw et al., 2006; Pessiglione et al., 2006; Tanaka et al., 2006).

Especially, tracking meta-parameter of RL can capture behavioral tendency of animal

or human decision making. Recently, correlation with anterior cingulated cortex

activity and learning rate in uncertain environmental change are reported by using the

approach with Bayesian decision model with temporal evolving the parameter of

learning rate (Behrens et al., 2007).

Although not detailed in this paper, Bayesian estimation framework also provides a

way of objectively selecting the best model in reference to the given data.

Combination of Bayesian model selection and hidden variable estimation methods

would contribute to a new understanding of decision mechanism of our brain through

falsifiable hypotheses and objective experimental tests.

References

1. Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M.F.: Learning the value of

information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007)

2. Corrado, G., Doya, K.: Understanding neural coding through the model-based analysis of

decision making. J. Neurosci. 27, 8178–8180 (2007)

3. Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for

exploratory decisions in humans. Nature 441, 876–879 (2006)

4. Doucet, A., Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice.

Springer, Heidelberg (2001)

5. Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H.,

Kawato, M.: A neural correlate of reward-based behavioral learning in caudate nucleus: a

functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24,

1660–1665 (2004)

6. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J., Frith, C.D.: Dopamine-dependent

prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045

(2006)

7. Samejima, K., Doya, K., Ueda, Y., Kimura, M.: Advances in neural processing systems,

vol. 16. The MIT Press, Cambridge, Massachusetts, London, England (2004)

8. Samejima, K., Ueda, Y., Doya, K., Kimura, M.: Representation of action-specific reward

values in the striatum. Science 310, 1337–1340 (2005)

9. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward.

Science 275, 1593–1599 (1997)

10. Sutton, R.S., Barto, A.G.: Reinforcement Learning. The MIT press, Cambridge (1998)

11. Tanaka, S.C., Samejima, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S., Doya, K.:

Brain mechanism of reward prediction under predictable and unpredictable environmental

dynamics. Neural Netw. 19, 1233–1241 (2006)

Visual Tracking Achieved by Adaptive Sampling

from Hierarchical and Parallel Predictions

Tomohiro Shibata

, Takashi Bando

, and Shin Ishii

1,3

1

Graduate School of Information Science, Nara Institute of Science and Technology

tom@is.naist.jp

DENSO Corporation

Graduate School of Informatics, Kyoto University

Abstract. Because the inevitable ill-posedness exists in the visual infor-

mation, the brain essentially needs some prior knowledge, prediction, or

hypothesis to acquire a meaningful solution. From computational point of

view, visual tracking is the real-time process of statistical spatiotemporal

ﬁltering of target states from an image stream, and incremental Bayesian

computation is one of the most important devices. To make Bayesian

computation of the posterior density of state variables tractable for any

types of probability distribution, Particle Filters (PFs) have been often

employed in the real-time vision area. In this paper, we brieﬂy review

incremental Bayesian computation and PFs for visual tracking, indicate

drawbacks of PFs, and then propose our framework, in which hierarchical

and parallel predictions are integrated by adaptive sampling to achieve

appropriate balancing of tracking accuracy and robustness. Finally, we

discuss the proposed model from the viewpoint of neuroscience.

Introduction

Because the inevitable ill-posedness exists in the visual information, the brain

essentially needs some prior knowledge, prediction, or hypothesis to acquire a

meaningful solution. The prediction is also essential for real-time recognition

or visual tracking. Due to ﬂood of visual data, examining the whole data is

infeasible, and ignoring the irrelevant data is essentially requisite. Primate fovea

and oculomotor control can be viewed from this point; high visual acuity is

realized by the narrow foveal region on the retina, and the visual axis has to

actively move by oculomotor control.

Computer vision, in particular real-time vision faces the same computational

problems discussed above, and attractive as well as feasible methods and appli-

cations have been developed in the light of particle ﬁlters (PFs) [4]. One of key

ideas of PFs is importance sampling distribution or proposal distribution which

can be viewed as prediction or attention in order to overcome the discussed

computational problems.

The aim of this paper is to propose a novel Bayesian visual tracking frame-

work for hierarchically-modeled state variables for single object tracking, and to

discuss the PFs and our framework from the viewpoint of neuroscience.

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 604–613, 2008.

c Springer-Verlag Berlin Heidelberg 2008

Visual Tracking Achieved by Adaptive Sampling

605

Adaptive Sampling from Hierarchical and Parallel

Predictions

2.1

Incremental Bayes and Particle Filtering

Particle ﬁltering [4] is an approach to performing Bayesian estimation of in-

tractable posterior distributions from time series signals with non-Gaussian

noise, such to generalize the traditional Kalman ﬁltering. This approach has been

attracting attention in various research areas, including real-time visual process-

ing (e.g., [5]). In clutter, there are usually several competing observations and

these causes the posterior to be multi-modal and therefore non-Gaussian.

In reality, using a large number of particles is not allowed especially for real-

time processing, and thus there is strong demand to reduce number of particles.

Reducing the number of particles, however, can lead to sacriﬁcing accuracy and

robustness of ﬁltering, particularly in case that the dimension of state variables

is high. How to cope with this trade-oﬀ has been one of the most important com-

putational issues in PFs, but there have been few eﬀorts to reduce the dimension

of the state variables in the context of PFs.

Making use of hierarchy in state variables seems a natural solution to this

problem. For example, in the case of pose estimation of a head from a video-

stream involving state variables of three-dimensional head pose and the two-

dimansional position of face features on the image, the high-dimensional state

space can be divided into two groups by using the causality in their states,

i.e., the head pose strongly aﬀects the position of feace features (cf. Fig. 1,

except dotted arrows). There are, however, two big problems in this setting.

First, the estimation of the lower state variables is strongly dependent on the

estimation of the higher state variables. In real applications, it often happens

that the assumption on generating from the higher state variable to the lower

state variable is violated. Second, the assumption on generating from the lower

state variable to the input, typically image frames, can also be violated. These

two problems lead to failure in the estimation.

2.2

Estimation of Hierarchically-Modeled State Variables

Here we present a novel framework for hierarchically-modeled state variables

for single object tracking. The intuition of our approach is that the higher and

lower layer have their own dynamics respectively, and mixing their predictions

over the proposal distribution based on their reliability adds both robustness

and accuracy in tracking with the fewer number of particles.

We assume there are two continuous state vectors at time step t, denoted by

∈ R

and x

∈ R

, and they are hierarchically modeled as in Fig. 1. Our goal

is then estimating these unobservable states from an observation sequence z

1:t

State Estimation. According to Bayes rule, the joint posterior density

p(a

t

, x

1:t

) is given by

p(a

, x

1:t

)

∝p(z

, x

)p(a

, x

1:t

−1

606

T. Shibata, T. Bando, and S. Ishii

Fig. 1. A graphical model of a hierarchical and parallel dynamics. Right panels depict

example physical representations processed in the layers.

where p(a

, x

1:t

−1

) is the joint prior density, and p(z

|a

t

, x

) is the likelihood.

The joint prior density p(a

, x

1:t

−1

) is given by the previous joint posterior

density p(a

−1

, x

−1

1:t

−1

) and the state transition model p(a

, x

−1

, x

−1

)

as follows:

p(a

, x

1:t

−1

)

p(a

, x

−1

, x

−1

)p(a

−1

, x

−1

1:t

−1

)da

−1

The state vectors a

and x

are assumed to be genereted from the hierarchical

model shown in Fig. 1. Furthermore, dynamics model of a

is assumed to be repre-

sented as a temporal Markov chain, conditional independence between a

−1

and x

given a

. Under these assumptions, the state transition model p(a

, x

−1

, x

−1

)

and the joint prior density p(a

, x

1:t

−1

) can be represented as

p(a

t

, x

−1

, x

−1

) =p(x

)p(a

−1

)

and

p(a

, x

1:t

−1

) =p(x

)p(a

1:t

−1

respectively.

Then, we can carry out hierarchically computation of the joint posterior dis-

tribution by PF as follows; ﬁrst, the higher samples

(i)

|i = 1, ..., N

} are drawn

from the prior density p(a

1:t

−1

). The lower samples

(j)

|j = 1, ..., N

} are

then drawn from the prior density p(x

1:t

−1

) described as the proposal dis-

tribution p(x

(i)

). Finally, the weights of samples are given by the likelihood

p(z

, x

Adaptive Mixing of Proposal Distribution. For applying to the state

estimation problem in the real world, the above proposal distribution can be

contaminated by the cruel non-Gaussian noise and/or the declination of the

assumption. Especially in the case of PF estimation with small number of par-

ticles, the contaminated proposal distribution can give fatal disturbance to the

Visual Tracking Achieved by Adaptive Sampling

607

estimation. In this study, we assumed the state transition, which are represented

as dotted arrows in Fig.1, in lower layer independently of upper layer, and we

suppose to enable robust and acculate estimation by adaptive determination

of the contribution the from hypothetical state transition. The state transition

p(a

t

, x

−1

, x

−1

) can be represented as

p(a

, x

−1

, x

−1

) =p(x

, x

−1

)p(a

−1

(1)

Here, the dynamics model of x

is assumed to be represented as

p(x

, x

−1

) =α

a,t

p(x

) + α

x,t

p(x

−1

(2)

with α

a,t

+ α

x,t

= 1. In our algorithm, p(x

|a

t

, x

−1

) is modelled as a mixture

of approximated prediction densities computed in the lower and higher layers,

p(x

t

|x

−1

) and p(x

). Then, its mixture ratio α

{α

a,t

, α

x,t

} representing

the contribution of each layer is determined by means of which enables the

interaction between the layers. We describe the determination way of the mixture

ratio α

in the following subsection. From Eqs. (1) and (2), the joint prior density

p(a

t

, x

1:t

−1

) is given as

p(a

t

, x

1:t

−1

) = p(x

, z

1:t

−1

)p(a

1:t

−1

)

= π(x

|α

)p(a

1:t

−1

where

π(x

|α

) =α

a,t

p(x

) + α

x,t

p(x

1:t

−1

)

(3)

is the adaptive-mixed proposal distribution for x

based on the prediction den-

sities p(x

) and p(x

|z

1:t

−1

Determination of α

using On-line EM Algorithm. The mixture ratio

t

is the parameter for determining the adaptive-mixed proposal distribution

π(x

|α

), and its determination by minimalizing the KL divergence between

the posterior density in the lower layer p(x

, z

1:t

) and π(x

|α

t

) gives robust

and accurate estimation. The determination of α

is equal to determination of

mixture ratio in two components mixture model, and we employ a sequential

Maximum-Likelihood (ML) estimation of the mixture ratio. In our method, the

index variable for componet selection becomes the latent variable, and therefore

the sequential ML estimation is implemented by means of an on-line EM algo-

rithm [11]. Resampling from the posterior density p(x

, z

1:t

), we obtain N

samples ˜

(i)

. Using the latent variable m =

a

, m

} indicates which prediction

density, p(x

) or p(x

1:t

−1

) is trusted, the on-line log likelihood can then be

represented as

L(α

) = η

τ =1

s=τ +1

i=1

log π(˜

(i)

|α

)

= η

τ =1

s=τ +1

i=1

log

p(˜

(i)

, m

|α

)

608

T. Shibata, T. Bando, and S. Ishii

1. Estimation of the state variable x

in the lower layer.

– Obtain α

a,t

samples x

(i)

a,t

from p(x

– Obtain α

x,t

samples x

(i)

x,t

from p(x

1:t

−1

– Obtain expectation ˆ

and Std(x

) of p(x

n,t

, z

1:t

) using N

mixture

samples x

(i)

, constituted by x

(i)

x,t

and x

(i)

a,t

Above procedure is applied to each feature, and obtain

{ˆx

n,t

, Std(x

n,t

)

2. Estimation of the state variable a

in the higher layer.

– Obtain D

a,n

(t) based on Std(x

n,t

), and then estimate p(a

1:t

3. Determination of the mixture ratio

t+1

– Obtain N

samples ˜

(i)

from p(x

n,t

, z

1:t

– Calculate

t+1

such to maximize the on-line log likelihood.

Above procedure is applied to each feature, and obtain

{¸

n,t+1

Fig. 2. Hierarchical pose estimation algorithm

where λ

is a decay constant to decline adverse inﬂuence from former inaccurate

estimation, and η

τ =1

s=τ +1

−1

is a normalization constant which

works as a learning coeﬃcient. The optimal mixture ratio α

∗

t

by means of the

KL divergence, which gives the optimal proposal distribution π(x

|α

∗

), can be

calculated by miximization of the on-line log likelihood as follows:

∗

m,t

∈m

where

p(˜

(i)

, m

|α

) =α

a,t

p(˜

(i)

p(˜

(i)

, m

|α

) = α

x,t

p(˜

(i)

1:t

−1

and

= η

τ =1

s=τ +1

i=1

p(m

|˜x

(i)

, α

p(m

|˜x

(i)

, α

) =

p(˜

(i)

, m

|α

)

∈m

p(˜

(i)

, m

|α

)

Note that

t

can be calculated incrementally.

2.3

Application to Pose Estimation of a Rigid Object

Here the proposed method is applied to a real problem, pose estimation of a rigid

object (cf. Fig. 1). The algorithm is shown in Fig. 2. N

features on the image

plane at time step t are denoted by x

n,t

= (u

n,t

, v

n,t

)

, (n = 1, ..., N

), an aﬃne

camera matrix which projects a 3D model of the object onto the image plane

at time step t is reshaped into a vector form a

= (a

1,t

, ..., a

8,t

)

, for simplicity,

and an observed image at time step t is denoted by z

.

When applied to the pose estimation of a rigid object, Gaussian process is

assumed in the higher layer and the object’s pose is estimated by Kalman Filter,

Visual Tracking Achieved by Adaptive Sampling

609

while tracking of the features is performed by PF because of the existence of cruel

non-Gaussian noise, e.g. occlusion, in the lower layer.

To obtain the samples x

(i)

n,t

from the mixture proposal distribution we need

two prediction densities, p(x

n,t

) and p(x

n,t

1:t

−1

). The prediction density

computed in the higher layer, p(x

n,t

), which contains the physical relationship

between the features, is given by the aﬃne projection process of the 3D model

of the rigid object.

2.4

Experiments

Simulations. The goal of this simulation is to estimate the posture of a rigid

object, as in Fig. 3A, from an observation sequence of eight features. The rigid

object was a hexahedral piece, whose size was 20 (upper base)

× 30 (lower

base)

× 20 (height) × 20 (depth) [mm], was rotated at 150mm apart from a

pin-hole camera (focal length: 40mm) and was projected onto the image plane

(pixel size: 0.1

× 0.1 [mm]) by a perspective projection process disturbed by

a Gaussian noise (mean: 0, standard deviation: 1 pixel). The four features at

back of the object were occluded by the object itself. An occluded feature was

represented by assuming the standard deviation of measurement noise grows to

100 pixels from a non-occluded value of 1 pixel. The other four features at front

of the object were not occluded. The length of the observation sequence of the

features was 625 frames, i.e., about 21 sec.

In this simulation, we compared the performance of our proposed method

with (i) the method with a ﬁxed α

{1, 0} which is equivalent to the simple

hierarchical modeling in which the prediction density computed in the higher

layer is trusted every time, (ii) the method with a ﬁxed α

{0, 1} which does

not implement the mutual interaction between the layers. The decay constant

of the on-line EM algorithm was set at λ

= 0.5.

The estimated pose by the adaptive proposal distribution and α

a,t

at each

time step are shown in Figs. 3B and 3C, respectively. Here, the object’s pose

t

=

{θ

X,t

, θ

Y,t

, θ

Z,t

} was calculated from the estimated a

using Extended

Kalman Filter (EKF). In our implementation, maximum and minimum values of

a,t

were limited to 0.8 and 0.2, respectively, to prohibit the robustness from de-

generating. As shown in Fig. 3B, our method achieved robust estimation against

the occlusion. Concurrently with the robust estimation of the object’s pose, ap-

propriate determination of the mixture ratio was exhibited. For example, as in

the case of the feature x

, the prediction density computed in the higher layer

was emphasized and well predicted using the 3D object’s model during the period

in which x

was occluded, because the observed feature position contaminated

by cruel noise depressed conﬁdence of the lower prediction density.

Real Experiments. To investigate the performance against the cruel non-

Gaussian noise existing in real environments, the proposed method was applied

to a head pose estimation problem of a driver from a real image sequence cap-

tured in a car. The face extraction/tracking from an image sequence is a well-

studied problem because of its applicability to various area, and then several

610

T. Shibata, T. Bando, and S. Ishii

Fig. 3. A: a sample simulation image. B: Time course of the estimated object’s

pose. C: Time course of α

a,t

determined by the on-line EM algorithm. Gray back-

ground represents a frame in which the feature was occluded.

PF algorithms have been proposed. However, for accurate estimation of state

variables, e.g. human face, lying in a high dimensional space especially in the

case of real-time processing, some techniques for dimensional reduction are re-

quired. The proposed method is expected to enable more robust estimation in

spite of limitation in computing resource by exploiting hierarchy of state vari-

ables. The real image sequence was captured by a near-infrared camera at the

back of a handle, i.e., the captured images did not contain color information,

and the image resolution was 640

× 480.

In such a visual tracking task, the true observation process p(z

n,t

) is un-

known because the true positions of face features are unobservable, hence, a

model of the observation process for calculating the particle’s weight w

(i)

n,t

needed. In this study, we employed normalized correlation with a template as

the model of the approximate observation process. Although this observation

model seems too simple to apply to problems in real environments, it is suf-

ﬁcient for examining the eﬃciency of the proposal distribution. We employed

nose, eyes, canthi, eyebrows, corners of mouth as the face features. Using the

3D feature positions measured by a 3D distance measuring equipment, we con-

structed the 3D face model. The proposed method was applied by employing

50 particles for each face feature, as well as in the simulation, and processed

by a Pentium 4 (2.8 GHz) Windows 2000 PC with 1048MB RAM. Our system

processed one frame in 29.05 msec, and hence achieved real-time processing.

Visual Tracking Achieved by Adaptive Sampling

611

Fig. 4. Mean estimation error in the case when α

was estimated by EM, ﬁxed at

a,t

{1, 0}, and ﬁxed at α

a,t

=

{0, 1} (100 trials). Since some bars protruded from

the ﬁgure, they were shorten and the error amount is instead displayed on the top of

them.

Fig. 5. Tracked face features

Fig. 4 shows the estimation error of the head pose, and the true head pose of

the driver measured by a gyro sensor in the car. Our adaptive mixing proposal

distribution achieved robust head pose estimation as well as in the simulation

task. In Fig. 5, the estimated face features are depicted by “+” for various head

pose of the driver; the variance of the estimated feature position is represented

as the size of the “+” mark, i.e., the larger a “+” is, the higher estimation

conﬁdence the estimator has.

Modeling Primate’s Visual Tracking by Particle Filters

Here we note that our computer vision study and primates’ vision share the same

computational problems and similar constraints. Namely, they need to perform

real-time spatiotemporal ﬁltering of the visual data robustly and accurately as

612

T. Shibata, T. Bando, and S. Ishii

much as possible with the limited computing resource. Although there are huge

numbers of neurons in the brain, their ﬁring rate is very noisy and much slower

than recent personal computers. We can visually track only around four to six

objects simultaneously (e.g., [2]). These facts indicate the limited attention re-

source. As mentioned at the beginning this paper, it is widely known that only

the foveal region on the retina can acquire high-resolution images in primates,

and that humans usually make saccades mostly to eyes, nose, lip, and contours

when we watch a human face. In other words, primates actively ignore irrelevant

information against massive image inputs. Furthermore, there have been many

behavioral and computational studies reporting that the brain would compute

Bayesian statistics (e.g, [8][10]). As we discussed, however, Bayesian computation

is intractable in general, and particle ﬁlters (PFs) is an attractive and feasible

solution to the problem as it is very ﬂexible, easy to implement, parallelisable.

Importance sampling is analogous to eﬃcient computing resource delivery.

As a whole, we conjecture that the primate’s brain would employ PFs for

visual tracking. Although one of the major drawbacks of PF is that a large

number of particles, typically exponential to the dimension of state variables,

are required for accurate estimation, our proposed framework in which adaptive

sampling from hierarchical and parallel predictive distributions can be a solution.

As demonstrated in section 2 and in other’s [1], adaptive importance sam-

pling from multiple predictions can balance both accuracy and robustness of the

estimation with a restricted numbers of particles. Along this line, overt/covert

smooth pursuit in primates could be a research target to investigate our conjec-

ture. Based on the model of Shibata, et al. [12], Kawawaki et al. investigated the

human brain mechanism of overt/covert smooth pursuit by fMRI experiments

and suggested that the activity of the anterior/superior lateral occipito-temporal

cortex (a/sLOTC) was responsible for target motion prediction rather than mo-

tor commands for eye movements [7]. Note that LOTC involves the monkey

medial superior temporal (MST) homologue responsible for visual motion pro-

cessing (e.g., [9][6]) In their study, the mechanism for increasing the a/sLOTC

activity remained unclear. The increase in the a/sLOTC activity was observed

particularly when subjects pursued blinking target motion covertly. This blink

condition might cause two predictions, e.g., one emphasizes observation and the

other its belief as proposed in [1]), and require the computational resource for

adaptive sampling.

Multiple predictions might be performed in other brain regions such as frontal

eye ﬁled (FEF), the inferior temporal (IT) area, fusiform face area (FFA). It is

known that FEF involved in smooth pursuit (e.g., [3]), and it has reciprocal

connection to the MST area (e.g., [13]), but how they work together is unclear.

Visual tracking for more general object tracking rather than a small spot as a

visual stimulus requires a speciﬁc target representation and distractors’ repre-

sentation inculding a background. So the IT, FFA and other areas related to

higher-order visual representation would be making parallel predictions to deal

with the varying target appearance during tracking.

Visual Tracking Achieved by Adaptive Sampling

613

Conclusion

In this paper, ﬁrst we have introduced particle ﬁlters (PFs) as an approximated

incremental Bayesian computation, and pointed out their drawbacks. Then, we

have proposed a novel framework for visual tracking based on PFs as a solution

to the drawback. The keys of the framework are: (1) high-dimensional state

space is decomposed into hierarchical and parallel predictors which treat state

variables in the lower dimension, and (2) their integration is achieved by adaptive

sampling. The feasibility of our frame work has been demonstrated by real as

well as simulation studies. Finally, we have pointed out the shared computational

problems between PFs and human visual tracking, presented our conjecture

that at least the primate’s brain employs PFs, and discussed its possibility and

perspectives for future investigations.

References

1. Bando, T., Shibata, T., Doya, K., Ishii, S.: Switching particle ﬁlters for eﬃcient

visual tracking. J. Robot Auton. Syst. 54(10), 873 (2006)

2. Cavanagh, P., Alvarez, G.A.: Tracking multiple targets with multifocal attention.

Trends in Cogn. Sci. 9(7), 349–354 (2005)

3. Fukushima, K., Yamanobe, T., Shinmei, Y., Fukushima, J., Kurkin, S., Peterson,

B.W.: Coding of smooth eye movements in three-dimensional space by frontal

cortex. Nature 419, 157–162 (2002)

4. Gordon, N.J., Salmond, J.J., Smith, A.F.M.: Novel approach to nonlinear non-

Gaussian Bayesian state estimation. IEEE Proc. Radar Signal Processing 140, 107–

113 (1993)

5. Isard, M., Blake, A.: Condensation - conditional density propagation for visual

tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)

6. Kawano, M., Shidara, Y., Watanabe, Y., Yamane, S.: Neural activity in cortical

area MST of alert monkey during ocular following responses. J. Neurophysiol 71(6),

2305–2324 (1994)

7. Kawawaki, D., Shibata, T., Goda, N., Doya, K., Kawato, M.: Anterior and superior

lateral occipito-temporal cortex responsible for target motion prediction during

overt and covert visual pursuit. Neurosci. Res. 54(2), 112

8. Knill, D.C., Pouget, A.: The Bayesian brain: the role of uncertainty in neural coding

and computation. Trends in Neurosci. 27(12) (2004)

9. Newsome, W.T., Wurtz, H., Komatsu, R.H.: Relation of cortical areas MT and

MST to pursuit eye movements. II. Diﬀerentiation of retinal from extraretinal

inputs. J. Neurophysiol 60(2), 604–620 (1988)

10. Rao, R.P.N.: The Bayesian Brain: Probabilistic Approaches to Neural Coding. In:

Neural Models of Bayesian Belief Propagation, MIT Press, Cambridge (2006)

11. Sato, M., Ishii, S.: On-line EM algorithm for the normalized gaussian network.

Neural Computation 12(2), 407–432 (2000)

12. Shibata, T., Tabata, H., Schaal, S., Kawato, M.: A model of smooth pursuit in

primates based on learning the target dynamics. Neural Netw. 18(3), 213

13. Tian, J.-R., Lynch, J.C.: Corticocortical input to the smooth and saccadic eye

movement subregions of the frontal eye ﬁeld in cebus monkeys. J. Neurophys-

iol 76(4), 2754–2771 (1996)

Bayesian System Identiﬁcation of Molecular

Cascades

Junichiro Yoshimoto

1,2

and Kenji Doya

1,2,3

Initial Research Project, Okinawa Institute of Science and Technology Corporation

12-22 Suzaki, Uruma, Okinawa 904-2234, Japan

{jun-y,doya}@oist.jp

Graduate School of Information Science, Nara Institute of Science and Technology

8916-5 Takayama, Ikoma, Nara 630-0192, Japan

ATR Computational Neuroscience Laboratories

2-2-2 Hikaridai, “Keihanna Science City”, Kyoto 619-0288, Japan

Abstract. We present a Bayesian method for the system identiﬁcation

of molecular cascades in biological systems. The contribution of this

study is to provide a theoretical framework for unifying three issues: 1)

estimating the most likely parameters; 2) evaluating and visualizing the

conﬁdence of the estimated parameters; and 3) selecting the most likely

structure of the molecular cascades from two or more alternatives. The

usefulness of our method is demonstrated in several benchmark tests.

Keywords: Systems biology, biochemical kinetics, system identiﬁcation,

Bayesian inference, Markov chain Monte Carlo method.

Introduction

In recent years, the analysis of molecular cascades by mathematical models has

contributed to the elucidation of intracellular mechanisms related to learning

and memory [1,2]. In such modeling studies, the structure and parameters of the

molecular cascades are selected based on the literature and databases

. How-

ever, if reliable information about a target molecular cascade is not obtained

from those repositories, we must tune its structure and parameters so as to ﬁt

the model behaviors to the available experimental data. The development of a

theoretically sound and eﬃcient system identiﬁcation framework is crucial for

making such models useful. In this article, we propose a Bayesian system iden-

tiﬁcation framework for molecular cascades.

For a given set of experimental data, the system identiﬁcation can be sep-

arated into two inverse problems: parameter estimation and model selection.

The most popular strategy for parameter estimation is to ﬁnd a single set of

parameters based on the least mean-square-error or maximum likelihood cri-

terion [3]. However, we should be aware that the estimated parameters might

DOQCS ( http://doqcs.ncbs.res.in/) and BIOMODELS ( http://biomodels.

net/) are available for example.

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 614–624, 2008.

c Springer-Verlag Berlin Heidelberg 2008

Bayesian System Identiﬁcation of Molecular Cascades

615

suﬀer from an “over-ﬁtting eﬀect” because the available set of experimental data

is often small and noisy. For evaluating the accuracy of the estimators, statisti-

cal methods based on the asymptotic theory [4] and Fisher information [5] were

independently proposed. Still, we must pay attention to practical limitations: a

large number of data are required for the former method; and the mathematical

model should be linear at least locally for the latter method. For model selec-

tion, Sugimoto et al. [6] proposed a method based on genetic programming. This

provided a systematic way to construct a mathematical model, but a criterion

to adjust a tradeoﬀ between data ﬁtness and model complexity was somewhat

heuristic. Below, we present a Bayesian formulation for the system identiﬁcation

of molecular cascades.

Problem Setting

In this study, we consider the dynamic behavior of molecular cascades that can

be modeled as a well-stirred mixture of I molecular species

1

, . . . , S

}, which

chemically interact through J elementary reaction channels

, . . . , R

} inside

a compartment with a ﬁxed volume Ω and constant temperature [7]. The dy-

namical state of this system is speciﬁed by a vector X(t)

≡ (X

1

(t), . . . , X

(t)),

where X

(t) is the number of the molecules of species S

in the system at time

t. If we assume that there are a suﬃciently large number of molecules within

the compartment, it is useful to take the state vector as the molar concentra-

tions x(t)

≡ (x

(t), . . . , x

(t)) by the transformation of x(t) = X(t)/(ΩN

where N

≈ 6.0 × 10

is Avogadro’s constant. In this case, the system behavior

can be well approximated by the deterministic ordinary diﬀerential equations

(ODEs) [8]:

˙x

i

(t) =

j=1

(ν

− ν

l=1

(t)

i = 1, . . . , I,

(1)

where ˙

≡ dx/dt denotes the time-derivative (velocity) of a variable x. ν

and

are the stoichiometric coeﬃcients of the molecular species S

as a reactant

and a product, respectively, in the reaction channel R

. k

is the rate constant

of the reaction channel R

. Equation (1) is called the law of mass action.

To illustrate an example of this mathematical model, let us consider a simple

interaction whose chemical equation is given by

A + B

k

f

−→

←−

AB.

(2)

Here, the two values associated with the arrows (k

and k

) are the rate constants

of their corresponding reactions. The dynamic behavior of this system is given

[ ˙

AB] =

−[ ˙A] = −[ ˙B] = k

[A][B]

− k

[AB],

(3)

where the bracket [S

] denotes the concentration of the molecular species S

Enzymatic reactions, which frequently appear in biological signalings, can

be modeled as a series of elementary reactions. In an enzymatic reaction that

616

J. Yoshimoto and K. Doya

changes a substrate S into a product P by the catalysis of an enzyme E, for exam-

ple, the enzyme-substrate complex ES is regarded as the intermediate product,

and the overall reaction is modeled as

E + S

−→

←−

cat

−→ E + P.

(4)

We can derive a system of ODEs corresponding to the chemical equation (4)

based on the law of mass action (1), but it is also common to use the approxi-

mation form called the Michaelis-Menten equation [8]:

[ ˙

P] =

−[ ˙S] = (k

cat

[E

tot

][S]) / (K

+ [S]) ,

(5)

where [E

tot

]

≡ [E] + [ES] is the total concentration of the molecular species

including the enzyme E. K

≡ (k

+ k

cat

)/k

is called the Michaelis-Menten

constant.

When considering signaling communications among other compartments and

biochemical experiments wherein drugs are injected, it is convenient to separate a

control vector u

≡ (u

1

, . . . , u

), which denotes I external eﬀects on the system,

from a state vector x. Let us consider a model (2), in which [B] can be arbitrarily

manipulated by an experimenter. In this case, the control vector is u = ([B]),

and the state vector is modiﬁed into x = ([A], [AB]).

By summarizing the above discussion, the dynamic models of molecular cas-

cades concerned in this study assume a general form of

˙x(t) = f (x(t), u(t), q

)

subject to

x(0) = q

(6)

where q

∈ R

is a set of I initial values of state variables

. q

∈ R

is a set of

M system parameters such as rate constants and Michaelis-Menten constants.

f : (x, u, q

)

→ ˙x is a function derived based on the law of mass action.

We assume that a set of control inputs U

≡ {u(t)|t ∈ [0, T

]

} has been

designed before commencing an experiment, where T

is a ﬁnal time point of

the experiment. Let

O ⊂ {1, . . . , I} be a set of indices such that the concentration

for any species in

|i ∈ O} can be observed through the experiment. Then,

let t

, . . . , t

∈ [0, T

] be the time points at which the observations are made

in the experiment. Using these notations, a set of all the data observed in the

experiment is deﬁned by Y

≡ {y

i

(t

)

|i ∈ O; n = 1, . . . , N}, where y

) is the

concentration of the species S

observed at time t

The objective of this study is to identify the mathematical model (6) from a

limited number of experimental data Y . This objective can be separated into two

inverse problems. One problem involves ﬁtting a parameter vector q

≡ (q

1

, q

)

so as to minimize the residuals

) = y

)

− x

; q, U ), where x

; q, U )

denotes the solution of the ODEs (6) at time t

that depends on the param-

eter vector q and control inputs U . In other words, we intend to ﬁnd q such

that model (6) reproduces the observation data Y as closely as possible. This

inverse problem is called parameter estimation. The second problem is to in-

fer the most likely candidate among multiple options

1

, . . . ,

} with dif-

ferent model structures (functional forms of f ) from the observation data Y .

We denote the set of all non-negative real values by

in accordance with usage.

Bayesian System Identiﬁcation of Molecular Cascades

617

This problem is called model selection, and we call each

(c = 1, . . . , C) a

“model candidate” in this study.

Bayesian System Identiﬁcation

In this section, we ﬁrst present our Bayesian approach to the parameter estima-

tion problem, and then extend the discussion to the model selection problem.

In statistical inference, the residuals

{

)

|i ∈ O; n = 1, . . . , N} are mod-

eled as unpredictable noises. At ﬁrst, we present a physical interpretation of

the noises. Let us assume that each observation y

) is obtained by sampling

particles of the molecular species S

from a small region with a ﬁxed volume ω

(0 < ω

i

< Ω) inside the model compartment. Since the interior of the model

compartment is well-stirred, the particles of S

would be distributed by a spa-

tial Poisson distribution that is independent of the states of the other molecular

species. This implies that the number of particles of the sampled species, X

i

(t

would be distributed according to a binomial distribution with the total number

of trials X

) and success probability (ω

/Ω) [9]. Also, this distribution can be

well approximated by X

)

i.i.d.

∼ N (ω

)/Ω, ω

)/Ω)

from the as-

sumption of X

)

0. Using the two relationships, y

) = X

)/(ω

)

and x

) = X

)/(ΩN

), we have y

(t

n

)

i.i.d.

∼ N (x

), x

)/(ω

)),

that is,

)

i.i.d.

∼ N (0, x

)/γ

) ,

(7)

where γ

≡ ω

. In general, we cannot determine the exact values of γ

≡

(γ

)

∈O

∈ R

|O|

; hence, they are regarded as latent variables to be estimated

along with the parameter q via the Bayesian inference.

Now, we consider the Bayesian inference of the set of all the unknown variables

≡ (q, γ), i.e., the posterior distribution of θ. By combining (7) with (6), we

have the conditional distribution of the observation data Y :

p(Y

|θ; U) =

n=1

∈O

); x

; q, U ), x

; q, U )/γ

) ,

(8)

where

(

·; ·, ·) denotes a p-dimensional Gaussian density function

. According

to Bayes’ theorem, the posterior distribution of θ is given by

p(θ

|Y ; U) =

p(Y

|θ;U)p(θ;U)

p(Y ;U )

p(Y

|θ;U)p(θ;U)

p(Y

|θ;U)p(θ;U)dθ

(9)

where p(θ; U ) is a prior distribution of θ deﬁned before obtaining Y . In this

study, we employ the prior distribution that has a form of

p(θ; U ) =

I+M

i=1

L (q

; ξ

)

∈O

G (γ

; α

, β

) ,

(10)

where q

and γ

are the i-th elements of the vectors q and γ, respectively.

0

, β

, and ξ

are small positive constants.

L (·; ·) and G (·; ·, ·) denote the density

N(μ, σ

) denotes a Gaussian (or normal) distribution with mean μ and variance σ

.

4

(x; m, S)

≡ (2π)

−p/2

|S|

−1/2

exp

{− (x − m) S

−1

(x

− m) /2}, where it should

be noted that all vectors are supposed to be row vectors throughout this article.

618

J. Yoshimoto and K. Doya

functions of a log-Laplace distribution and a gamma distribution, respectively

By substituting (8) and (10) for (9), the posterior distribution is well-deﬁned.

The posterior distribution can be regarded as a degree of certainty of θ de-

duced from the outcome Y [10]. The most “likely” value of θ can be given by the

maximum a posteriori (MAP) estimator θ

M AP

≡ argmax

p(θ

|Y ; U). This value

is asymptotically equivalent to the solutions of the maximum likelihood method

and a generalized least-mean-square method as the number of time points N

approaches inﬁnity. We must be aware that the MAP estimator asymptotically

has a variance of O(1/N ) as long as N is ﬁnite. Accordingly, it is informative to

evaluate a high-conﬁdence region of θ. The Bayesian inference can be used for

this purpose, as will be shown in Section 4.

Although we have considered the parameter estimation problem for a given

model candidate so far, the denominator of (9), which is called the evidence,

plays an important role in the model selection problem. To make the dependence

on model candidates explicit, let us write the evidence p(Y ; U ) for each candidate

(c = 1, . . . , C) as p(Y

; U ). The posterior distribution of

is given by

|Y ; U) ∝ p(Y |C

; U ) if the prior distribution over

, . . . ,

} is uniform.

This implies that the model candidate

∗

, where

∗

= argmax

p(Y

; U ), is

the optimal solution in the sense of the MAP estimation. Accordingly, the main

task in the Bayesian model selection reduces to the calculation of the evidence

p(Y ; U ) for each model candidate

The major diﬃculty encountered in the Bayesian inference is that an in-

tractable integration of the denominator in the ﬁnal equation in (9) must be

evaluated. For convenience of implementation, we adopt a Markov chain Monte

Carlo technique [11] to approximate the posterior distribution and evidence.

Hereafter, we describe only the key essence of our approximation method.

The basic framework of our approximation is based on a Gibbs sampling

method [11]. We suppose that it is possible to draw the realizations q and γ

directly from two conditional distributions with density functions p(q

|γ, Y ; U)

and p(γ

|q, Y ; U), respectively. In this case, the following iterative procedures

asymptotically generate a set of L (L

1) random samples Θ

1:L

≡ {θ

(l)

≡

(l)

, γ

(l)

)

|l = 1, . . . , L} distributed according to the density function p(θ|Y ; U).

1. Initialize the value of γ

(0)

appropriately and set l = 1.

2. Generate q

(l)

from the distribution with the density p(q

|γ

−1)

, Y ; U ).

3. Generate γ

(l)

from the distribution with the density p(γ

(l)

, Y ; U ).

4. Increment l := l + 1 and go to Step 2.

Since we select a gamma distribution as the prior distribution of γ

in this study,

p(γ

|q, Y ; U) always has the form

∈O

G (γ

|α, β(q)), where

α = α

0

+

and

β(q) = β

n=1

)

−x

;q,U ))

;q,U )

(11)

This implies that it is possible to draw γ directly from p(γ

|q, Y ; U) using a

gamma-distributed random number generator in Step 3. On the other hand, the

L (x; ξ) ≡ exp{−| ln x|/ξ}/(2ξx) and G (γ

; α, β)

≡ β

−1

−βx

/Γ (α), where

Γ (α)

≡

∞

−1

−t

dt is the gamma function.

Bayesian System Identiﬁcation of Molecular Cascades

619

analytical calculation of p(q

|γ, Y ; U) is still intractable; however, we know that

p(q

|γ, Y ; U) is proportional to p(Y |θ; U)p(θ; U) up to the normalization con-

stant. Thus, we approximate Step 2 by a Metropolis sampling method [11]. Here,

a transition distribution of the Markov chain is constructed based on an adaptive

direction sampling method [12] in order to improve the eﬃciency of the sam-

pling. Indeed, it is required to compute the solution of the ODEs x

(t

n

; q

(l)

, U )

explicitly in both Steps 2 and 3; however, this is not signiﬁcant because many

numerical ODE solvers are currently available (see [13] for example).

After the procedures generate L random samples Θ

1:L

distributed according

to the density function p(θ

|Y ; U), the target function p(θ|Y ; U) can be well

reconstructed by the kernel density estimator [14]:

p(θ

|Y ; U) =

l=1

θ; θ

(l)

, V ; V =

+2)L

(Dθ +4)

Cov(Θ

1:L

), (12)

where D

is the dimensionality of the vector θ, and Cov(Θ

1:L

) is the covariance

matrix of the set of samples Θ

1:L

. Futhermore, assuming that ˜

p(θ

|Y ; U) is a

good approximator of the true posterior distribution p(θ

|Y ; U), the importance

sampling theory [11] states that the integration of the denominator in (9) (i.e.,

the evidence p(Y ; U )) can be well approximated by

p(Y ; U ) =

l=1

p(Y

|θ

(l)

;U )p(

(l)

;U )

(l)

|Y ;U)

(13)

Numerical Demonstrations

In order to demonstrate the usefulness, we applied our Bayesian method to sev-

eral benchmark problems. For all the benchmark problems, the set of constants

for the prior distribution were ﬁxed at α

= 10

−4

, β

= 1, and ξ

= 1. The total

number of Monte Carlo samples were set at L = 10

, and every element in the

initial sample θ

(0)

was drawn from a uniform distribution within the range of

[0.99

× 10

−10

, 1.01

× 10

−10

In the ﬁrst benchmark, we supposed a situation where the structure of the

molecular cascades was ﬁxed by the chemical equation (2), but [A] was un-

observable at any time point. Instead of real experiments, the observations

Y with the total number of time points N = 51 were synthetically gener-

ated by (3) using the parameters q

= (k

, k

) = (5.0, 0.01) and the initial

state q

0

= ([A](0), [B](0), [AB](0)) = (0.4, 0.3, 0.01), where the noises in the ob-

servation processes were added according to the model (7) with the parameter

= 1000. The circles and squares in Fig.1(a) show the generated observation Y .

For the observation data Y , we evaluated the posterior distribution of the set of

all the unknown variables θ using the method described in Section 3. Figures 1(b-

f) show the posterior distributions of non-trivial unknown variables (k

, k

, [A](0)

and γ

for species B and AB), respectively, after marginalizing out any other vari-

ate. Here, the bold-solid lines denote the true values used in generating data Y ,

and the bold-broken lines are MAP estimators that were approximately obtained

by argmax

∈{1,...,L}

p(Y

|θ

(l)

; U )p(θ

(l)

; U ). The histograms indicate the distribu-

tion of Monte Carlo samples Θ

1:L

, and the curves surrounding the histograms

620

J. Yoshimoto and K. Doya

1

2

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Time

Concentration

Observation data Y and reconstruction

[B]

obs

[AB]

obs

[B]

model

[AB]

model

0.1

0.2

0.3

0.4

0.5

0.6

p(k

|Y;U)

Posterior distribution: P(k

|Y;U)

hist

pdf

true

MAP

0.5%

99.5%

0.01

0.02

0.03

0.04

0.05

p(k

|Y;U)

Posterior distribution: P(k

|Y;U)

hist

pdf

true

MAP

0.5%

99.5%

(a)

(b)

(c)

0.3

0.35

0.4

0.45

0.5

0.55

0.6

[A](0): initial concentration of species A

p([A](0)|Y;U)

Posterior distribution: P([A](0)|Y;U)

hist

pdf

true

MAP

0.5%

99.5%

500

1000

1500

2000

0.5

1.5

x 10

-3

(for species B)

γ

2

|Y;U)

Posterior distribution: P(

2

|Y;U)

hist

pdf

true

MAP

0.5%

99.5%

400

600

800

1000

1200

0.5

1.5

2.5

x 10

-3

(for species AB)

|Y;U)

Posterior distribution: P(

|Y;U)

hist

pdf

true

MAP

0.5%

99.5%

(d)

(e)

(f)

Fig. 1. Brief summaries of results in the ﬁrst benchmark

f

k

Posterior distribution: P(k

,k

b

|Y)

0.005

0.01

0.015

0.02

0.025

0.03

[A](0)

Posterior distribution: P(k

,[A](0)|Y)

0.3

0.35

0.4

0.45

0.5

[A](0)

Posterior distribution: P(k

,[A](0)|Y)

0.01

0.02

0.03

0.3

0.35

0.4

0.45

0.5

500

1000

1500

Fig. 2. The joint posterior distributions of two out of three parameters (k

, k

, [A](0))

denote the kernel density estimators (12). Though model (3) with the MAP es-

timators could reproduce the observation data Y well (see solid and broken lines

in Fig.1(a)), the estimators were slightly diﬀerent from their true values because

the total number of data was not very large. For this reason, the broadness of the

posterior distribution provided useful information about the conﬁdence in infer-

ence. To evaluate the conﬁdence more quantitatively, we deﬁned “99% credible

interval” as the interval between 0.5% and 99.5% percentiles along each axis of

1:N

. The dotted lines in Figs.1(b-f) show the 99% credible interval. Since the

posterior distribution was originally a joint probability density of all the unknown

variables θ, it was also possible to visualize the highly conﬁdent parameter space

even in higher dimensions, which provided useful information about the depen-

dence among the parameters. The three panels in Fig.2 show that the joint poste-

rior distribution of two out of the three parameters (k

, k

, and [A](0)), where the

white circles denote the true values. From the panels, we can easily observe that

these parameters have a strong correlation between them. Interestingly, highly

Bayesian System Identiﬁcation of Molecular Cascades

621

Table 1. MAP estimators and 99% credible intervals in the second benchmark

Parameters True MAP 99% itvls.

(

×10

−5

) 7.00 7.00 [6.83,7.19]

2

(

×10

) 6.00 6.22 [5.77,6.66]

syn

(

×10

−2

) 1.68 1.97 [1.46,3.97] K

(

×10

−2

) 1.00 0.74 [0.28,1.28]

10

20

0.5

1.5

2.5

Time

Concentration

Observation data Y and reconstruction

[B]

model

[B]

obs

[S]

model

[S]

obs

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Time

Concentration

Observation data Y and reconstruction

[M1]

model

[M1]

obs

[M2]

model

[M2]

obs

[M3]

model

[M3]

obs

syn

Posterior distribution: P(k

syn

|Y)

0.015

0.02

0.025

0.03

0.002

0.004

0.006

0.008

0.01

0.012

0.5

1.5

x 10

(a)

(b)

(c)

Fig. 3. (a-b) Observation data Y and state trajectories reconstructed in the second

benchmark; and (c) the joint posterior distribution of (k

syn

, K

)

conﬁdent spaces over (k

-[A](0)) and (k

-[A](0)) had hyperbolic shapes that re-

ﬂected the bilinear form in the ODEs (3).

To demonstrate the applicability to more realistic molecular cascades, we

adopted a simulator of a bioreactor system developed by [15] in the second

benchmark. The mathematical model corresponding to this system was given by

the following ODEs:

[ ˙

B] = (μ

− q

)[B]; [ ˙S] = q

(c

in

− [S]) − r

[B]

[ ˙

M1] = r

− r

− μ[M1]; [ ˙

M2] = r

2

− r

− μ[M2]; [ ˙

M3] = r

syn

− μ[M3]

≡

1,max

[S]

+[S]

; r

≡

[M3][M1]

+[M1]

; r

≡

3,max

[M2]

+[M2]

; r

syn

≡

syn

+[M2]

; μ = Y

where M

, r

1,max

, K

, r

3,max

, and K

were assumed to be known con-

stants that were same as those reported in [15]. u

≡ (c

, q

) were control inputs

in this simulator, and all the state variables x

≡ ([B], [S], [M1], [M2], [M3]) were

observable. The observation time-series Y were downloaded from the website

The objective in this benchmark was to estimate four unknown system param-

eters q

≡ (Y

, k

syn

, K

). As a result of evaluating the posterior distribution

over all the elements of θ, the MAP estimators and 99% credible intervals of q

were obtained, and are shown in Table 1. Indeed, the observation noises in this

benchmark were distributed according to

)

i.i.d.

∼ N(0, (0.1x

))

), which

was diﬀerent from our model (7). This inconsistency and small amount of data

led to MAP estimators that were deviated slightly from the true values. How-

ever, it was notable that the model with the MAP estimators could reproduce

the observation data Y fairly well, as shown in Figs.3(a-b), so that the credible

intervals provide good information about the plausibility/implausibility of the

http://sysbio.ist.uni-stuttgart.de/projects/benchmark

622

J. Yoshimoto and K. Doya

1

2

0.2

0.4

0.6

[S]

Observation data Y

and reconstruction

Obs.

0.1

0.2

Time

[P]

Obs.

0.2

0.4

0.6

[S]

Observation data Y

and reconstruction

Obs.

0.2

0.4

Time

[P]

Obs.

100

200

300

400

373.41

168.29

306.69

Model candidate

log-evidence: ln p(Y

|C;U)

Log-evidence of three candidates

100

200

300

189.74

246.92

246.16

Model candidate

log-evidence: ln p(Y

|C;U)

Log-evidence of three candidates

(a) for observation data Y

(b) for observation data Y

Fig. 4. Brief summaries of results in the model selection problem

estimated parameters. In addition, we could ﬁnd a nonlinear dependence between

the two parameters (k

syn

, K

) even in this problem, as shown in Fig.3(c).

Finally, our Bayesian method was applied to a simple model selection problem.

In this problem, we supposed that the observation data were obtained from the

molecular cascades whose chemical equations were written as

k=1

−→ S; S + E

f 1

−→

←−

f 2

−→

←−

P + E,

(14)

where x

≡ ([S], [P], [E], [SE]) was a state vector of our target compartment, and

only two elements, [S] and [P], were observable. u

≡ ([S

in

]) was a control input

to the compartment. We considered three model candidates depending on the

reversibility of the reaction:

1

: The ODEs consistent with model (14) was constructed based on the law of

mass action without any approximation. In this case, four system parameters

f 1

, k

f 2

, and k

) were estimated.

2

: k

was suﬃciently small so that the second part of model (14) was reduced

to model (4). The estimated parameters were also reduced to (k

f 1

, k

f 2

: In addition to the assumption of

2

, the second part of model (14) was well

approximated by the ODEs (5). In this case, the system parameters were

further reduced to (K

M 1

, k

f 2

), where K

M 1

≡ (k

+ k

f 2

)/k

f 1

The goal of this benchmark is to select the most reasonable candidate from

} for the given observation data Y . We prepared two sets of observation

data Y

and Y

that were generated by model candidates

1

and

, respectively.

The circles in the ﬁrst and second panels in Figs.4(a-b) show the points of data Y

and Y

. Here, the curves associated with the circles denote the state trajectories

Bayesian System Identiﬁcation of Molecular Cascades

623

reconstructed by the three model candidates with their MAP estimators. Then,

we evaluated the evidence ˜

p(Y

d

|C

; U ) for d = 1, 2 and c = 1, . . . , 3 by Eq.(13).

The results are shown in the bottom panels in Figs.4(a-b), where the evidence is

transformed into the natural logarithm to avoid numerical underﬂow. As shown

in Fig.4(a), only

could satisfactorily reproduce the observation data Y

, while

the others could not. Accordingly, the evidence for the model candidate C

was

maximum. For the observation data Y

, all the model candidates could repro-

duce them satisfactorily in a similar manner, as shown in Fig.4(b). In this case,

the evidence for the simplest model candidate C

was nearly as high as that for

, from which the observation data Y

were generated. These results demon-

strated that the evidence provided useful information for determining whether

a structure of molecular cascades could be further simpliﬁed or not.

Conclusion

In this article, we presented a Bayesian method for the system identiﬁcation of

molecular cascades in biological systems. The advantage of this method is to

provide a theoretically sound framework for unifying parameter estimation and

model selection. Due to the limited space, we have shown only the small-scale

benchmark results here. However, we have also conﬁrmed that the precision

of our MAP estimators was comparable to that of an existing good parame-

ter estimation method in a medium-scale benchmark with 36 unknown system

parameters [3]. The current limitations of our method are diﬃculty in the conver-

gence diagnosis of the Monte Carlo sampling and time-consuming computation

for large-scale problems. We will overcome these limitations in our future work.

References

1. Bhalla, U.S., Iyengar, R.: Emergent properties of networks of biological signaling

pathways. Science 387, 283–381 (1999)

2. Doi, T., et al.: Inositol 1,4,5-trisphosphate-dependent Ca

threshold dynamics

detect spike timing in cerebellar Purkinje cells. The Journal of Neuroscience 25(4),

950–961 (2005)

3. Moles, C.G., et al.: Parameter estimation in biochemical pathways: A comparison

of global optimization methods. Genome Research 13(11), 2467–2474 (2003)

4. Faller, D., et al.: Simulation methods for optimal experimental design in systems

biology. Simulation 79, 717–725 (2003)

5. Banga, J.R., et al.: Computation of optimal identiﬁcation experiments for nonlinear

dynamic process models: A stochastic global optimization approach. Industrial &

Engineering Chemistry Research 41, 2425–2430 (2002)

6. Sugimoto, M., et al.: Reverse engineering of biochemical equations from time-course

data by means of genetic programming. BioSystems 80, 155–164 (2005)

7. Gillespie,

D.T.:

The

chemical

langevin

equation.

Journal

Chemical

Physics 113(1), 297–306 (2000)

8. Crampin, E.J., et al.: Mathematical and computational techniques to deduce com-

plex biochemical reaction mechanisms. Progress in Biophysics & Molecular Biol-

ogy 86, 77–112 (2004)

624

J. Yoshimoto and K. Doya

9. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, 2nd

edn. Springer, Heidelberg (2003)

10. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley & Sons Inc., Chich-

ester (2000)

11. Andrieu, C., et al.: An introduction to MCMC for machine learning. Machine

Learning 50(1-2), 5–43 (2003)

12. Gilks, W.R., et al.: Adaptive direction sampling. The Statistician 43(1), 179–189

(1994)

13. Hindmarsh, A.C., et al.: Sundials: Suite of nonlinear and diﬀerential/algebraic

equation solvers. ACM Transactions on Mathematical Software 31, 363–396 (2005)

14. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman

and Hall, Boca Raton (1986)

15. Kremling, A., et al.: A benchmark for methods in reverse engineering and model

discrimination: Problem formulation and solutions. Genome Research 14(9), 1773–

1785 (2004)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 625–634, 2008.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 53 54 55 56 57 58 59 60 ... 88