Mashinali o‘qitishga kirish Nosirov Xabibullo xikmatullo o‘gli Falsafa doktori (PhD), tret kafedrasi mudiri

1 2 3 4 5 6 7 8 9 10

Bog'liq
Mashinali o\'qitishga kirish 20-ma\'ruza Nosirov Kh

Cluster Analysis

K-Means and Hierarchical Clustering
The Statistics and Machine Learning Toolbox includes functions to perform K-means clustering and hierarchical clustering.
K-means clustering is a partitioning method that treats observations in your data as objects having locations and distances from each other. It partitions the objects into K mutually exclusive clusters, such that objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible. Each cluster is characterized by its centroid, or center point. Of course, the distances used in clustering often do not represent spatial distances.

Hierarchical clustering is a way to investigate grouping in your data, simultaneously over a variety of scales of distance, by creating a cluster tree. The tree is not a single set of clusters, as in K-Means, but rather a multi-level hierarchy, where clusters at one level are joined as clusters at the next higher level. This allows you to decide what scale or level of clustering is most appropriate in your application.
Some of the functions used in this example call MATLAB® built-in random number generation functions. To duplicate the exact results shown in this example, you should execute the command below, to set the random number generator to a known state. If you do not set the state, your results may differ in trivial ways, for example, you may see clusters numbered in a different order. There is also a chance that a suboptimal cluster solution may result (the example includes a discussion of suboptimal solutions, including ways to avoid them).

rng(6,'twister')

In the 1920's, botanists collected measurements on the sepal length, sepal width, petal length, and petal width of 150 iris specimens, 50 from each of three species. The measurements became known as Fisher's iris data set.
Each observation in this data set comes from a known species, and so there is already an obvious way to group the data. For the moment, we will ignore the species information and cluster the data using only the raw measurements. When we are done, we can compare the resulting clusters to the actual species, to see if the three types of iris possess distinct characteristics.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7 8 9 10