Advances in K-means Clustering: a Data Mining Thinking by Junjie Wu

By Junjie Wu

Nearly we all know K-means set of rules within the fields of knowledge mining and company intelligence. however the ever-emerging info with super complex features carry new demanding situations to this "old" set of rules. This booklet addresses those demanding situations and makes novel contributions in developing theoretical frameworks for K-means distances and K-means dependent consensus clustering, picking out the "dangerous" uniform impact and zero-value predicament of K-means, adapting correct measures for cluster validity, and integrating K-means with SVMs for infrequent type research. This publication not just enriches the clustering and optimization theories, but additionally presents solid tips for the sensible use of K-means, specially for very important initiatives similar to community intrusion detection and credits fraud prediction. The thesis on which this e-book relies has gained the "2010 nationwide very good Doctoral Dissertation Award", the top honor for no more than a hundred PhD theses in keeping with 12 months in China.

Show description

Read or Download Advances in K-means Clustering: a Data Mining Thinking PDF

Best data mining books

Advances in Database Technology - EDBT 2004

This booklet constitutes the refereed complaints of the ninth overseas convention on Extending Database know-how, EDBT 2004, held in Heraklion, Crete, Greece, in March 2004. The forty two revised complete papers awarded including 2 business program papers, 15 software program demos, and three invited contributions have been rigorously reviewed and chosen from 294 submissions.

Seeing Cities Through Big Data: Research, Methods and Applications in Urban Informatics (Springer Geography)

This publication introduces the newest considering at the use of massive info within the context of city structures, together with  research and insights on human habit, city dynamics, source use, sustainability and spatial disparities, the place it can provide more desirable making plans, administration and governance within the city sectors (e.

Extra resources for Advances in K-means Clustering: a Data Mining Thinking

Example text

In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 331–338. Melbourne (2003) 90. : A mixture model for clustering ensembles. In: Proceedings of the 4th SIAM International Conference on Data Mining. Florida (2004) 91. : The Nature of Statistical Learning. Springer, New York (1995) 92. : Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on, Machine Learning, pp. 577–584 (2001) 93. : Top 10 algorithms in data mining.

2009 IEEE. Reprinted, with permission, from Ref. 5 The Hazard of the Biased Effect Having the biased effect, it is very dangerous to use the entropy measure for the validation of K-means. e. re0, re1, wap, ecoli, and k1a, for experiments. We did K-means clustering on these data sets, and labeled each cluster by the label of the members in majority. We found that many true clusters were disappeared in the clustering results. 7 shows the percentage of the disappeared true clusters in the clustering results.

7 shows the percentage of the disappeared true clusters in the clustering results. As can be seen, every data set has a significant number of true clusters disappeared. 4 Entropy Percentage of Classes Disappeared (%) Fig. 7 The percentage of the disappeared true clusters in highly imbalanced data. © 2009 IEEE. Reprinted, with permission, from Ref. 4 Entropy Percentage of Classes Disappeared (%) Fig. 8 The percentage of the disappeared true clusters in relatively balanced data. © 2009 IEEE. Reprinted, with permission, from Ref.

Download PDF sample

Rated 4.10 of 5 – based on 39 votes