The Major Data Mining Tasks


Download 466 b.
Sana08.07.2018
Hajmi466 b.


The Major Data Mining Tasks








































Simple Linear Classifier



















Predictive Accuracy I



Predictive Accuracy II















Interpretability

















































































Bayes Classifiers



Bayes Classifiers

  • Bayesian classifiers use Bayes theorem, which says

  • p(cj | d ) = p(d | cj ) p(cj)

  • p(d)

  • p(cj | d) = probability of instance d being in class cj,

  • This is what we are trying to compute

  • p(d | cj) = probability of generating instance d given class cj,

  • We can imagine that being in class cj, causes you to have feature d with some probability

  • p(cj) = probability of occurrence of class cj,

  • This is just how frequent the class cj, is in our database

  • p(d) = probability of instance d occurring

  • This can actually be ignored, since it is the same for all classes



Assume that we have two classes

  • Assume that we have two classes

  • c1 = male, and c2 = female.

  • We have a person whose sex we do not know, say “drew” or d.

  • Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male | drew) or p(female | drew)











To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

  • To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

  • p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)



To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

  • To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

  • p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)



p(d1|cj) p(d2|cj) p(dn|cj)

    • p(d1|cj) p(d2|cj) p(dn|cj)


p(d1|cj) p(d2|cj) p(dn|cj)

    • p(d1|cj) p(d2|cj) p(dn|cj)




p(d1|cj) p(d2|cj) p(dn|cj)

    • p(d1|cj) p(d2|cj) p(dn|cj)


Naïve Bayesian Classifier

    • p(d1|cj) p(d2|cj) p(dn|cj)


Naïve Bayesian Classifier

    • p(d1|cj) p(d2|cj) p(dn|cj)


Naïve Bayesian Classifier

    • p(d1|cj) p(d2|cj) p(dn|cj)










Summary of Classification

















Two Types of Clustering



Desirable Properties of a Clustering Algorithm



A Useful Tool for Summarizing Similarity Measurements



















(How-to) Hierarchical Clustering



















Up to this point we have simply assumed that we can measure similarity, but How do we measure similarity?













K-means Clustering: Step 1



K-means Clustering: Step 2



K-means Clustering: Step 3



K-means Clustering: Step 4



K-means Clustering: Step 5



Comments on the K-Means Method

  • Strength

    • Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.
    • Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms
  • Weakness

    • Applicable only when mean is defined, then what about categorical data?
    • Need to specify k, the number of clusters, in advance
    • Unable to handle noisy data and outliers
    • Not suitable to discover clusters with non-convex shapes


The K-Medoids Clustering Method

  • Find representative objects, called medoids, in clusters

  • PAM (Partitioning Around Medoids, 1987)

    • starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering
    • PAM works effectively for small data sets, but does not scale well for large data sets


























Partitional Clustering Algorithms

  • Clustering algorithms have been designed to handle very large datasets

  • E.g. the Birch algorithm

    • Main idea: use an in-memory R-tree to store points that are being clustered
    • Insert points one at a time into the R-tree, merging a new point with an existing cluster if is less than some  distance away
    • If there are more leaf nodes than fit in memory, merge existing clusters that are close to each other
    • At the end of first pass we get a large number of clusters at the leaves of the R-tree
      • Merge clusters to reduce the number of clusters


Partitional Clustering Algorithms

  • The Birch algorithm



Partitional Clustering Algorithms

  • The Birch algorithm



Partitional Clustering Algorithms

  • The Birch algorithm













Association Rules (market basket analysis)

  • Retail shops are often interested in associations between different items that people buy.

    • Someone who buys bread is quite likely also to buy milk
    • A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts.
  • Associations information can be used in several ways.

    • E.g. when a customer buys a particular book, an online shop may suggest associated books.
  • Association rules:

  • bread milk DB-Concepts, OS-Concepts  Networks

    • Left hand side: antecedent, right hand side: consequent
    • An association rule must have an associated population; the population consists of a set of instances
      • E.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the population


Association Rule Definitions

  • Set of items: I={I1,I2,…,Im}

  • Transactions: D={t1,t2, …, tn}, tj I

  • Itemset: {Ii1,Ii2, …, Iik}  I

  • Support of an itemset: Percentage of transactions which contain that itemset.

  • Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.



Association Rules Example



Association Rule Definitions

  • Association Rule (AR): implication X Y where X,Y  I and X  Y = the null set;

  • Support of AR (s) X Y: Percentage of transactions that contain X Y

  • Confidence of AR () X Y: Ratio of number of transactions that contain X  Y to the number that contain X



Association Rules Example



Association Rules Example



Association Rule Problem

  • Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij  I, the Association Rule Problem is to identify all association rules X Y with a minimum support and confidence (supplied by user).

  • NOTE: Support of X Y is same as support of X  Y.



Association Rule Algorithm (Basic Idea)

  • Find Large Itemsets.

  • Generate rules from frequent itemsets.



Association Rule Algorithm





Apriori Algorithm

  • Large Itemset Property:

  • Any subset of a large itemset is large.

  • Contrapositive:

  • If an itemset is not large,

  • none of its supersets are large.



Large Itemset Property



Large Itemset Property



Conclusions




Do'stlaringiz bilan baham:


Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2017
ma'muriyatiga murojaat qiling