By Junjie Wu
Nearly we all know K-means set of rules within the fields of knowledge mining and company intelligence. however the ever-emerging info with super complex features carry new demanding situations to this "old" set of rules. This booklet addresses those demanding situations and makes novel contributions in developing theoretical frameworks for K-means distances and K-means dependent consensus clustering, picking out the "dangerous" uniform impact and zero-value predicament of K-means, adapting correct measures for cluster validity, and integrating K-means with SVMs for infrequent type research. This publication not just enriches the clustering and optimization theories, but additionally presents solid tips for the sensible use of K-means, specially for very important initiatives similar to community intrusion detection and credits fraud prediction. The thesis on which this e-book relies has gained the "2010 nationwide very good Doctoral Dissertation Award", the top honor for no more than a hundred PhD theses in keeping with 12 months in China.
Read or Download Advances in K-means Clustering: a Data Mining Thinking PDF
Best data mining books
This booklet constitutes the refereed complaints of the ninth overseas convention on Extending Database know-how, EDBT 2004, held in Heraklion, Crete, Greece, in March 2004. The forty two revised complete papers awarded including 2 business program papers, 15 software program demos, and three invited contributions have been rigorously reviewed and chosen from 294 submissions.
This publication introduces the newest considering at the use of massive info within the context of city structures, together with research and insights on human habit, city dynamics, source use, sustainability and spatial disparities, the place it can provide more desirable making plans, administration and governance within the city sectors (e.
- Dark Web: Exploring and Data Mining the Dark Side of the Web
- PostgreSQL Server Programming - Second Edition
- Data Mining in Finance: Advances in Relational and Hybrid Methods (The Springer International Series in Engineering and Computer Science)
- Advances in Information Retrieval: 32nd European Conference on IR Research, ECIR 2010, Milton Keynes, UK, March 28-31, 2010. Proceedings (Lecture Notes in Computer Science)
Extra resources for Advances in K-means Clustering: a Data Mining Thinking
In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 331–338. Melbourne (2003) 90. : A mixture model for clustering ensembles. In: Proceedings of the 4th SIAM International Conference on Data Mining. Florida (2004) 91. : The Nature of Statistical Learning. Springer, New York (1995) 92. : Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on, Machine Learning, pp. 577–584 (2001) 93. : Top 10 algorithms in data mining.
2009 IEEE. Reprinted, with permission, from Ref. 5 The Hazard of the Biased Effect Having the biased effect, it is very dangerous to use the entropy measure for the validation of K-means. e. re0, re1, wap, ecoli, and k1a, for experiments. We did K-means clustering on these data sets, and labeled each cluster by the label of the members in majority. We found that many true clusters were disappeared in the clustering results. 7 shows the percentage of the disappeared true clusters in the clustering results.
7 shows the percentage of the disappeared true clusters in the clustering results. As can be seen, every data set has a significant number of true clusters disappeared. 4 Entropy Percentage of Classes Disappeared (%) Fig. 7 The percentage of the disappeared true clusters in highly imbalanced data. © 2009 IEEE. Reprinted, with permission, from Ref. 4 Entropy Percentage of Classes Disappeared (%) Fig. 8 The percentage of the disappeared true clusters in relatively balanced data. © 2009 IEEE. Reprinted, with permission, from Ref.