# Dissimilarities and matching between symbolic objects prof. Donato Malerba

Download 445 b.
 Sana 21.07.2018 Hajmi 445 b. • ## Several data analysis techniques are based on quantifying a dissimilarity (or similarity) measure between multivariate data.

• Clustering
• Discriminant analysis
• Visualization-based approaches

• ## The dissimilarity measures presented here are among those investigated in the ASSO Project. ## A case study ## The construction of SO ## TABLE OF BOOLEAN SYMBOLIC OBJECTS • ## Dissimilarity matrix ## The MID property ## The MID property • ## where each variable Yjtakes values in Yj and Aj and Bj are subsets of Yj. We are interested to compute the dissimilarity d(a,b). • ## if[Yj = Sj]then[Yi = Si] ## DISSIMILARITY AND SIMILARITY MEASURES

• Dissimilarity Measure
• d: EER such that d*a = d(a,a) d(a,b) = d(b,a) <a,bE
• Similarity Measure
• s: EE R such that s*a = s(a,a) s(a,b) = s(b,a) 0a,bE
• Generally:
• aE: d*a = d* and s*a= s* and specifically, d* = 0 while s*= 1
• Dissimilarity measures can be transformed into similarity measures (and viceversa):
• d=(s) ( s=-1(d) )
• where:
• (s) strictly decreasing function, and (1) = 0, (0) =  ## DISSIMILARITY AND SIMILARITY MEASURES: PROPERTIES • ## SO: for both constrained and unconstrained BSO’s • ## U_1 • ## No proof is reported for the triangle inequality property • ## where 0    0.5 and Ajis defined depending on variable types. • ## U_4 • ## where (Vj) is either the cardinality of the set Vj (if Yj is a nominal variable) or the length of the interval Vj (if Yj is a continuous variable). • ## SO_1 • ## SO_2 • ## where (Vj) is either the cardinality of the set Vj (if Yj is a nominal variable) or the length of the interval Vj (if Yj is a continuous variable). • ## The triangular inequality does not hold for SO_3 and SO_4, which are equivalent. SO_5 is a metric. • ## A similar extension exists for hierarchical dependencies. • ## If all BSO’s are coherent, then the dissimilarity measures do not change. • ## where: ## DISSIMILARITY MEASURES FOR CONSTRAINED BSO’S ## DISSIMILARITY MEASURES FOR CONSTRAINED BSO’S ## DISSIMILARITY MEASURES FOR CONSTRAINED BSO’S ## DISSIMILARITY MEASURES FOR CONSTRAINED BSO’S • ## If all BSO’s are coherent, then the dissimilarity measures do not change. • ## Matching two structures is a common problem to many domains, like symbolic classification, pattern recognition, data mining and expert systems. • ## In the ASSO software two matching operators for BSO’s have been defined. • ## it happens that:

• Match(a,b) = 1 if BiAi for each i=1, 2, , p,
• Match(a,b) = 0 otherwise. • ## Indiv2 = [profession=salesman]  [age=[27,28]]

• Match(District1,Indiv1) = 1
• Match(District1,Indiv2) = 0 • ## The canonical matching function satisfies two out of three properties of a similarity measure:

•  a, b  E: Match(a, b)  0
•  a, b  E: Match(a, a)  Match(a, b)
• ## while it does not satisfy the commutativity or simmetry property:

•  a, b  E: Match(a, b) = Match(b, a)
• ## because of the different role played by a and b. • ## flexible-matching: E × E  [0,1] • ## that is flexible-matching(a,b) equals the maximum conditional probability over the space of BSO’s canonically matched by a. • ## 6 rules generated by Quinlan’s system C4.5 • ## Both matching operators can be considered in order to test the validity of the induced rules. • ## = 1-(flexible_matching(a,b)+flexible_matching(b,a))/2   • ## Define coefficients measuring the divergence between two probability distributions • ## Symmetrize the non symmetric coefficients

• m(P,Q)= m(Q,P) + m(P,Q)

• ## PSO Dissimilarity measures • ## This possibility should be taken with great care!!! • ## D. Malerba, F. Esposito, V. Gioviale, & V. Tamma. Comparing Dissimilarity Measures in Symbolic Data Analysis. Pre-Proceedings of EKT-NTTS, vol. 1, pp. 473-481. • ## Other project reports • ## Developer: Dipartimento di Informatica, University of Bari, Italy. ## TWO USE CASE DIAGRAMS • ## The user can select a subset of variables Yi on which the dissimilarity measure or the matching operator has to computed . • ## The user can select a number of parameters. • ## abalone output file • ## Output report file • ## Visualization of the dissimilarity table • ## Visualization of a line graph of dissimilarities • ## Visualization of a scatterplot of Sammon’s nonlinear mapping into a bidimensional space Download 445 b.

Do'stlaringiz bilan baham:

Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2020
ma'muriyatiga murojaat qiling