By plotting the raw data, you can see the differences in the cluster shapes created using the two different distances. The two solutions are similar, but the two upper clusters are elongated in the direction of the origin when using cosine distance. for i = 1:3 clust = find(cidxCos==i); plot3(meas(clust,1),meas(clust,2),meas(clust,3),ptsymb{i}); hold on end hold off xlabel('Sepal Length'); ylabel('Sepal Width'); zlabel('Petal Length'); view(-137,10); grid on Clustering Fisher's Iris Data Using K-Means Clustering This plot does not include the cluster centroids, because a centroid with respect to the cosine distance corresponds to a half-line from the origin in the space of the raw data. However, you can make a parallel coordinate plot of the normalized data points to visualize the differences between cluster centroids. lnsymb = {'b-','r-','m-'}; names = {'SL','SW','PL','PW'}; meas0 = meas ./ repmat(sqrt(sum(meas.^2,2)),1,4); ymin = min(min(meas0)); ymax = max(max(meas0)); for i = 1:3 subplot(1,3,i); plot(meas0(cidxCos==i,:)',lnsymb{i}); hold on; plot(cmeansCos(i,:)','k-','LineWidth',2); hold off; title(sprintf('Cluster %d',i)); xlim([.9, 4.1]); ylim([ymin, ymax]); h_gca = gca; h_gca.XTick = 1:4; h_gca.XTickLabel = names; end Clustering Fisher's Iris Data Using K-Means Clustering Because we know the species of each observation in the data, you can compare the clusters discovered by kmeans to the actual species, to see if the three species have discernibly different physical characteristics. In fact, as the following plot shows, the clusters created using cosine distance differ from the species groups for only five of the flowers. Those five points, plotted with stars, are all near the boundary of the upper two clusters. subplot(1,1,1); for i = 1:3 clust = find(cidxCos==i); plot3(meas(clust,1),meas(clust,2),meas(clust,3),ptsymb{i}); hold on end
Do'stlaringiz bilan baham: |