Simple Linear Classifier
Predictive Accuracy I
Predictive Accuracy II
Interpretability
Bayes Classifiers
Bayes Classifiers Bayesian classifiers use Bayes theorem, which says p(cj | d ) = p(d | cj ) p(cj) p(d) p(cj | d) = probability of instance d being in class cj, This is what we are trying to compute p(d | cj) = probability of generating instance d given class cj, We can imagine that being in class cj, causes you to have feature d with some probability p(cj) = probability of occurrence of class cj, This is just how frequent the class cj, is in our database p(d) = probability of instance d occurring This can actually be ignored, since it is the same for all classes
Assume that we have two classes Assume that we have two classes c1 = male, and c2 = female. We have a person whose sex we do not know, say “drew” or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male | drew) or p(female | drew)
To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
p(d1|cj) p(d2|cj) p(dn|cj) - p(d1|cj) p(d2|cj) p(dn|cj)
p(d1|cj) p(d2|cj) p(dn|cj) - p(d1|cj) p(d2|cj) p(dn|cj)
p(d1|cj) p(d2|cj) p(dn|cj) - p(d1|cj) p(d2|cj) p(dn|cj)
Naïve Bayesian Classifier - p(d1|cj) p(d2|cj) p(dn|cj)
Naïve Bayesian Classifier - p(d1|cj) p(d2|cj) p(dn|cj)
Naïve Bayesian Classifier - p(d1|cj) p(d2|cj) p(dn|cj)
Summary of Classification
Two Types of Clustering
Desirable Properties of a Clustering Algorithm
A Useful Tool for Summarizing Similarity Measurements
Up to this point we have simply assumed that we can measure similarity, but How do we measure similarity?
K-means Clustering: Step 1
K-means Clustering: Step 2
K-means Clustering: Step 3
K-means Clustering: Step 4
K-means Clustering: Step 5
Comments on the K-Means Method Strength - Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.
- Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms
Weakness - Applicable only when mean is defined, then what about categorical data?
- Need to specify k, the number of clusters, in advance
- Unable to handle noisy data and outliers
- Not suitable to discover clusters with non-convex shapes
The K-Medoids Clustering Method Find representative objects, called medoids, in clusters PAM (Partitioning Around Medoids, 1987) - starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering
- PAM works effectively for small data sets, but does not scale well for large data sets
Partitional Clustering Algorithms Clustering algorithms have been designed to handle very large datasets E.g. the Birch algorithm - Main idea: use an in-memory R-tree to store points that are being clustered
- Insert points one at a time into the R-tree, merging a new point with an existing cluster if is less than some distance away
- If there are more leaf nodes than fit in memory, merge existing clusters that are close to each other
- At the end of first pass we get a large number of clusters at the leaves of the R-tree
- Merge clusters to reduce the number of clusters
Partitional Clustering Algorithms
Partitional Clustering Algorithms
Partitional Clustering Algorithms
Association Rules (market basket analysis) Retail shops are often interested in associations between different items that people buy. - Someone who buys bread is quite likely also to buy milk
- A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts.
Associations information can be used in several ways. - E.g. when a customer buys a particular book, an online shop may suggest associated books.
Association rules: bread milk DB-Concepts, OS-Concepts Networks - Left hand side: antecedent, right hand side: consequent
- An association rule must have an associated population; the population consists of a set of instances
- E.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the population
Set of items: I={I1,I2,…,Im} Transactions: D={t1,t2, …, tn}, tj I Itemset: {Ii1,Ii2, …, Iik} I Support of an itemset: Percentage of transactions which contain that itemset. Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.
Association Rules Example
Association Rule Definitions Association Rule (AR): implication X Y where X,Y I and X Y = the null set; Support of AR (s) X Y: Percentage of transactions that contain X Y Confidence of AR () X Y: Ratio of number of transactions that contain X Y to the number that contain X
Association Rules Example
Association Rules Example
Association Rule Problem Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I, the Association Rule Problem is to identify all association rules X Y with a minimum support and confidence (supplied by user). NOTE: Support of X Y is same as support of X Y.
Association Rule Algorithm (Basic Idea) Find Large Itemsets. Generate rules from frequent itemsets.
Association Rule Algorithm
Apriori Algorithm Large Itemset Property: Any subset of a large itemset is large. Contrapositive: If an itemset is not large, none of its supersets are large.
Large Itemset Property
Large Itemset Property
Conclusions
Do'stlaringiz bilan baham: |