Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
2 3 4 5 6 E
3 -1 -2 -3 -4 -5 time
SDM IN 10 cities time 10
10 10
10 10
1 10
10 10
1 10 10 10 10 10 10 2 3 4 5 6 3 2 -1 -2 -3 -4 -5 E SDM CN 10 cities (a) (b)
Fig. 3. Residual energy by (a) independent noises (IN) and (b) correlated noise (CN) for 10 cities method with correlated noises is 90 times. The proposed method can obtain most solutions in these methods. Next, Figure 3 shows the transition of residual energy by the method with independent noise and correlated noise, where the updating steps of internal state are 1,000,000 times. The residual energy E res means difference between energy E(t) of V xi at time t and the energy of the optimum solution, E opt
; E res = E(t) − E
opt . (15) We found that the energy do not go down through 0 by the method with inde- pendent noise, but by the proposed method. Solution Method Using Correlated Noise for TSP 739
0 5 10 15 20
25 30
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 CN IN
ratio R % 29 cities Fig. 4. Histogram of path length for 29 cities. we assumed β = 0.35, the variance of independent noise is σ 2 ζ
2 η = 0.04. The number of obtained solutions by the CN is 979 times, one by IN is 931 times and one by the SDM is 883 times. 0 20
40 60
80 100
0.02 0.04
0.06 0.08
σ 0.10
0 2 % CN ( β =0.50) CN ( β =0.35) IN ( β =0.35) IN ( β =0.50) 29 cities η ζ σ 2
Fig. 5. Solutions ratio (%) for 29 cities For the 29 cities, we calculate the solutions, where the variances of indepen- dent noise are σ 2 ζ = 0.01 ∼ 0.10, and the variances of correlated noise are σ 2 η
0.01 ∼ 0.10. The optimum solutions could not be obtained by all these methods for this location. Therefore, we calculate the path lengths for obtained solutions. Figure 4 shows the histogram of path lengths when solutions are obtained. The abscissa represents the ratio R, and the ordinate represents the percentage that the solutions having ratio R are obtained. The method with independent noise 740 A. Goto and M. Kawamura Table 1. The optimum variance and the solution ratio for β = 0.35 and β = 0.5 β=0.35
β=0.5 optimum variance σ 2 ζ
2 η = 0.03 σ 2 ζ = 0.05 σ 2 η = 0.05 solution ratio(%) 93 98 56 75 and the proposed method can obtain better solutions than the steepest descent method. We compare the solution ratio of obtained solutions for the method with independent noise with one for the proposed method. Figure 5 shows the solution ratio for variances σ 2 ζ
2 η in the cases of β = 0.35, 0.50. Table 1 shows the optimum variance and the solution ratio. In the case of β = 0.35, the number of obtained solutions by the method of independent noises with σ 2 ζ
times. The number of obtained solutions by the proposed method with σ 2 η = 0.03 is 98 times. There are not so much of a difference between them. On the other hand, in the case of β = 0.5, number of solutions by the method of independent noises with σ 2 ζ
2 η = 0.05 is 75 times. Namely, the proposed method can obtain better solutions than the method with independent noises. We, therefore, found that the proposed method can be much more effective than the method with independent noises in order to obtain solutions depending on β. 5 Conclusion In associative memory models, the correlated noise is effective in state transition. In this paper, we proposed the solution method using the correlated noise and applied to TSP that is one of the typical combinational optimization problems of NP-hard. As the results, for the case of the 10 cities, the proposed method with the correlated noises can obtains more solutions than both the steepest descent method and the method with independent noises. For the case of the 29 cities, all these methods cannot be obtained any optimum solutions. However, we found that the proposed method can obtain better solutions than the existing methods depending on β. From these results, we can show that the correlated noises is also effective for the combinational optimization problems. Acknowledgments This work was partially supported by a Grant-in-Aid for Young Scientists (B) No. 16700210. The computer simulation results were obtained using the PC cluster system at Yamaguchi University. References 1. Abeles, M.: Corticonics. Cambridge Univ. Press, Cambridge (1991) 2. Dlesmann, M., Gewaltig, M.-O., Aertsen, A.: Stable Propagation of Synchronous Spiking in Cortical Neural Networks. Nature 402, 529–533 (1999) Solution Method Using Correlated Noise for TSP 741
3. Cˆ ateau, H., Fukai, T.: Fokker-Planck Approach to the Pulse Packet Propagation in Synfire Chain. Neural Networks 14, 675–685 (2001) 4. Amari, S., Nakahara, H., Wu, S., Sakai, Y.: Synchronous Firing and Higher-Order Interactions in Neuron Pool. Neural.Comp. 15, 127–143 (2003) 5. Aoyagi, T., Aoki, T.: Possible Role of Synchronous Input Spike Trains in Control- ling the Function of Neural Networks. Neurocomputing 264, 58–60 (2004) 6. Kawamura, M., Okada, M.: Stochastic Transitions of Attractors in Associative Memory Models with Correlated Noise. J. Phys. Soc. Jpn 75, 124–603 (2006) 7. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983) 8. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of State Calculations by Fast Computing Machies. J. Chem. Phys. 21(6), 1087–1092 (1953)
9. Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Ryestoration of Image. IEEE Trans 6, 721–741 (1984) 10. Zhou, C.-S., Chan, T.-L.: Chaotic Annealing for Optimization. Phys. Rev. E 55, 2580–2587 (1997) 11. Tokuda, I., Nagashima, T., Aihara, K.: Global Bifurcation Structure of Chaotic Neural Networks and its Application to Traveling Salesman Problems. Neural Net- works 10, 1673–1690 (1997) 12. TSPLIB, http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/
Bayesian Collaborative Predictors for General User Modeling Tasks Jun-ichiro Hirayama 1 , Masashi Nakatomi 2 , Takashi Takenouchi 1 ,
1,3 1 Graduate School of Information Science, Nara Institute of Science and Technology, Takayama 8916-5, Ikoma, Nara {junich-h,ttakashi}@is.naist.jp 2 Ricoh Company, Ltd. nakatomi@rdc.ricoh.co.jp 3 Graduate School of Informatics, Kyoto University ishii@i.kyoto-u.ac.jp Abstract. Collaborative approach is of crucial importance in user mod- eling to improve the individual prediction performance when only insuf- ficient amount of data are available for each user. Existing methods such as collaborative filtering or multitask learning, however, have a limita- tion that they cannot readily incorporate a situation where individual tasks are required to model a complex dependency structure among the task-related variables, such as one by Bayesian networks. Motivated by this issue, we propose a general approach for collaboration which can be applied to Bayesian networks, based on a simple use of Bayesian prin- ciple. We demonstrate that the proposed method can improve both the prediction accuracy and its variance in many cases with insufficient data, in an experiment with a real-world dataset related to user modeling. Keywords: User modeling, collaborative method, Bayesian network, Bayesian inference. 1 Introduction Predicting users’ actions based on past observations of their behaviors is an im- portant topic for developing personalized systems. Such prediction usually needs a user model that effectively represents the knowledge of a user or a group of users, and is useful for the prediction. User modeling (or user profiling) [12,10,3] is currently an active research area for this aim, which seeks the methods of ac- quiring user models and making prediction based on them. Recently, probabilis- tic graphical models such as Bayesian networks (BNs) [8,6] have been attracted an attention as an effective modeling tool in this field, because of their capacity to deal with relatively complex dependency structures among variables, which is a favorable property in general user modeling tasks. One crucial demand on user modeling is to develop “collaborative” methods which utilize the other users’ information to improve the prediction of a target M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 742–751, 2008. c Springer-Verlag Berlin Heidelberg 2008 Bayesian Collaborative Predictors 743
user. This is due partly to the limited sample size for individual users [10], and partly to the assumption that users may share common intension to decide. Con- sider an interactive system that is repeatedly used by many users, such as some sorts of web sites, say e-commerce ones, and publicly-used electronic devices, say network printers. It should be considered that all the users do not neces- sarily interact with the system actively so that there are insufficient amount of data to construct reliable models of some users. This is also the case with new users for the system; prediction about new users would often fail due to their limited sample sizes. Furthermore, in large-scale problems, it would be almost impossible in practice to collect a sufficient amount of data for any user. This is actually the case in many e-commerce recommendation systems [9]. Collabo- rative methods are particularly important to complement the lack of individual data by considering the relationship to other users. Collaborative filtering (CF) [2] is probably the best known collaborative method in the context of recommendation, which estimates users’ unknown ratings over items based on known ratings of similar users. The problem setting of rating esti- mation is, however, a limited one in that it do not usually consider general depen- dency structures among multiple variables (including ratings and content-related ones), so that the usual CF methods cannot be directly applied to more general tasks. Alternatively, multi-task learning (MTL) has recently been an active re- search topic in machine learning, which solves multiple related prediction tasks si- multaneously to improve each task’s performance by exploiting the generalization among the tasks. However, the application of exsisting MTL schemes to general graphical models is not so straightforward, especially when one consider the learn- ing of graph structure in addition to their parameters. One attempt to the MTL of BNs has recently been reported in [7] in which a prior distribution over multiple graph structures of BNs is employed to force them to have similar structures for each other. However, the joint determination of multiple model structure is com- putationally expensive, and also the method do not allow heterogeneity among users which usually exists in user modeling tasks. In this article, we propose a simple framework to collaborative user modeling, with particular interest in its application to Bayesian networks. The principal aim here is to improve the typically low performance of prediction in the initial phase after a user started to interact the system. Our approach flexibly realizes knowledge sharing among individual models that have already been learned in- dividually, instead of follwoing the usual MTL setting in which the models are simultaneously learned. A similar post-sharing approach has recently been inves- tigated in a different context in [11], which focused on the selection of relevant subset of individual models. In this article, we also evaluate the proposed method with a real dataset, which has been collected in a real-world user modeling task. 2 Bayesian Network For a general purpose of user modeling, BN is one of the key tools for representing knowledge and making prediction of users [12]. While our proposed approach 744 J. Hirayama et al. A B
D E Fig. 1. An example of DAG described in the next section is not limited to a specific learning model, we focus on its application to (discrete) BNs in this article, considering the importance of BNs in user modeling. In this section, we briefly review the learning and prediction by BN. For more details, see [8,6]. BN is a probabilistic graphical model that defines a class of parametric distrib- ution over multiple random variables in terms of a directed acyclic graph (DAG). Fig. 1 shows an example of DAG. Each node corresponds to a single variable, and each edge represents conditional dependence between the variables. BN has a relatively high representational power in conjunction with effective predic- tion and marginalization algorithms such as the belief propagation (BP) [8] or the junction tree algorithm (JT) [4]. In addition, BN is attractive because of a human-interpretable knowledge representation in terms of DAG. Let v be a set of the variables of interests. The probability distribution of BN can be written, corresponding to a specific DAG G, as p(v | θ, G) = i p (v(i)
| pa i , θ i , G) , (1)
where v(i) denotes the i-th element (node i) of v and pa i the set of parents variables of the node i. The model parameters are denoted by θ = {θ i }, where θ i is the parameters that define the local conditional distribution on v(i). Note that both pa i and θ i are defined under the specification of the DAG G. The training of BN is conducted in two steps. First, the structure of BN is determined according to a certain scoring criterion (structure learning). Second, the conditional multinomial probability distribution of each node given its par- ents is estimated by Bayesian inference by assuming a conjugate Dirichlet prior (parameter learning). In this study, we assume no hidden variables. The most basic (but rather computationally-expensive) scoring criterion in the structure learning is then the (log-)marginal likelihood of the graph structures. After given a structure, the parameter learning can be done in a straightforward manner. 3 Bayesian Collaborative Predictors 3.1 Prediction with Individual User Models Consider that a user u makes a repeated interaction with a target system, with generating a sample v n u
{x n u , t n u } in the n-th usage, where x u and t u denote
Bayesian Collaborative Predictors 745
the sets of input and target variables, respectively. Given a set of observations, D u = {v n u | n = 1, 2, . . . , N u }, the individual prediction task is stated generally as to predict a new instance of the target, t new
u , given a new input x new u
article, we assume both x u and t u are discrete variables (for allowing the use of discrete BN), while our approach can be applied in a straightforward manner for the other cases such that the input and/or target variables are continuous. A key requirement of our approach is to construct an individual user model as a conditional probability distribution, p u (t
| x u ), where we explicitly put the subscript u to indicate the user. This can be done by any kinds of probabilistic regressors or classifiers, but we focus on the use of discrete BN to realize it. Given a BN joint distribution p u (v u ) that has already been trained based on the individual dataset D u , which can be done as described in the previous section, there are some ways to obtain the conditional distribution. One naive way is to directly calculate the conditional probabilities for all possible realizations of t u and x
u with the BN parameters, and then normalize them for each condition of x u
enumeration becomes intractable if the number of variables increases, but this approach is easy to implement while enabling exact computation; it is thus useful in the cases with moderate number of variables. When there are a large number of variables, however, the exact calculation is not realistic, and then other ways are required. The state-of-the-art methods of probabilistic reasoning such as BP or JT would efficiently calculate the marginals of single target nodes or a subset/clique of target nodes conditional on a given input, instead of joint distribution over all the target variables. The conditional distribution p u (t u | x
t ) (which considers joint probability over t u ) can then be approximated by the products of the resultant marginals. Based on the conditional distribution obtained by these methods, prediction for each individual can be simply made as giving its conditional mode: ˆ t new u = argmax t u p u (t u | x u = x new u ). (2) In this article, this type of prediction is referred to as individual prediction, in contrast to collaborative ones. 3.2
Bayesian Collaboration of Pre-learned Predictors Consider that individual user models, i.e., the conditional distributions, have already learned for U users. The number of training samples used in the learn- ing, however, may be quite different for each user. The prediction accuracy of the users with small sample sizes can become low, probably with a high uncer- tainty/variance of learning. The aim of this study is to deal with this problem by introducing a framework of knowledge sharing among the pre-learned individual user models, based on the Bayesian perspective. Our approach is based on a rather tricky use of Bayesian principle. For con- venience of explanation, we assume for a moment a virtual agent who makes prediction of a specific user s by collecting the information of the other users.
746 J. Hirayama et al. Suppose the agent can freely access any other user’s model and dataset. The available information for the agent is then the U conditional models, p 1 (t
| x 1 ), p 2 (t 2 | x
2 ), . . . , p U (t
| x U ), each of which is for different tasks but somehow related to each other, and also the U original datasets, D 1 , D 2 , . . . , D U . The trick here is to regard each of the other users’ models as another hypothesis that also describes the behaviors of user s. The agent then turns out to have U conditional hypotheses, p 1 (t s | x
s ), p
2 (t s | x s ), . . . , p U (t s | x s ), for the specific prediction task of user s. If the agent is a Bayesian, a natural choice in such a situation is to compute the posterior distribution over U hypotheses, given the corresponding dataset D s , and then form a predictive distribution by taking posterior average over the conditional models. Using the notations: T s =
n s | n = 1, 2, . . . , N s } and X
s = {x n s | n = 1, 2, . . . , N s }, the posterior distribution over U models can be given as Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling