C:/Documents and Settings/Administrator/My Documents/Research/cf-eml2010. dvi
Download 131.18 Kb. Pdf ko'rish
|
recommender
- Bu sahifa navigatsiya:
- 3.1.1 Neighborhood-based Collaborative Filtering
3
Structure of Learning System The most general setting in which recommender systems are studied is presented in Figure 1. Known user preferences are represented as a matrix of n users and m items, where each cell r u,i corresponds to the rating given to item i by the user u. This user ratings matrix is typically sparse, as most users do not rate most items. The recommendation task is to predict what rating a user would give to a previously unrated item. Typically, ratings are predicted for all items that have not been observed by a user, and the highest rated items are presented as recommen- dations. The user under current consideration for recommendations is referred to as the active user. The myriad approaches to Recommender Systems can be broadly categorized as • Collaborative Filtering (CF): In CF systems a user is recommended items based on the past ratings of all users collectively. • Content-based recommending: These approaches recommend items that are similar in content to items the user has liked in the past, or matched to attributes of the user. • Hybrid approaches: These methods combine both collaborative and content- based approaches. 3 3.1 Collaborative Filtering Collaborative Filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rating be- haviour amongst several users in determining how to recommend an item. CF methods can be further sub-divided into neighborhood-based and model-based approaches. Neighborhood-based methods are also commonly referred to as memory- based approaches [5]. 3.1.1 Neighborhood-based Collaborative Filtering In neighborhood-based techniques, a subset of users are chosen based on their similarity to the active user, and a weighted combination of their ratings is used to produce predictions for this user. Most of these approaches can be generalized by the algorithm summarized in the following steps: 1. Assign a weight to all users with respect to similarity with the active user. 2. Select k users that have the highest similarity with the active user – com- monly called the neighborhood. 3. Compute a prediction from a weighted combination of the selected neigh- bors’ ratings. In step 1, the weight w a,u is a measure of similarity between the user u and the active user a. The most commonly used measure of similarity is the Pearson correlation coefficient between the ratings of the two users [30], defined below: w a,u = P i ∈I (r a,i − r a )(r u,i − r u ) q P i ∈I (r a,i − r a ) 2 P i ∈I (r u,i − r u ) 2 (1) where I is the set of items rated by both users, r u,i is the rating given to item i by user u, and r u is the mean rating given by user u. In step 3, predictions are generally computed as the weighted average of devi- ations from the neighbor’s mean, as in: p a,i = r a + P u ∈K (r u,i − r u ) × w a,u P u ∈K w a,u (2) where p a,i is the prediction for the active user a for item i, w a,u is the similarity between users a and u, and K is the neighborhood or set of most similar users. 4 Similarity based on Pearson correlation measures the extent to which there is a linear dependence between two variables. Alternatively, one can treat the ratings of two users as a vector in an m-dimensional space, and compute similarity based on the cosine of the angle between them, given by: w a,u = cos(~r a , ~r u ) = ~r a · ~r u k~r a k 2 × k~r u k 2 = P m i =1 r a,i r u,i q P m i =1 r 2 a,i q P m i =1 r 2 u,i (3) When computing cosine similarity, one cannot have negative ratings, and unrated items are treated as having a rating of zero. Empirical studies [5] have found that Pearson correlation generally performs better. There have been several other similarity measures used in the literature, including Spearman rank correlation, Kendall’s τ correlation, mean squared differences, entropy, and adjusted cosine similarity [36, 12]. Below we discuss several extensions to neighborhood-based CF, which have led to improved performance. Download 131.18 Kb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling