๐Ÿ“š Bibliography: K-Means and Clustering

These references build on the storyline behind the K-Means Cluster Explorer demo by highlighting rigorous treatments of the algorithm, practical tooling for experimentation, heuristics to pick the right number of clusters, and real-world deployments in healthcare analytics.


Table of Contents

  1. Algorithm Foundations
  2. Hands-On Guides & Tooling
  3. Choosing the Number of Clusters
  4. Domain Applications

1. Algorithm Foundations

ResourceTypeNotesAccess
scikit-learn: ClusteringTechnical guideDetailed comparison between K-Means variants, initialization schemes, and convergence criteria.https://scikit-learn.org/stable/modules/clustering.html
Wikipedia: K-means clusteringReference articleHistorical context, Lloydโ€™s algorithm, and common refinements.https://en.wikipedia.org/wiki/K-means_clustering
Bishop, C. M. (2006). Pattern Recognition and Machine LearningTextbookChapter 9 develops the derivation of K-Means from an expectationโ€“maximization perspective.https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical LearningTextbookSection 14.3 contrasts K-Means with model-based clustering approaches.https://hastie.su.domains/ElemStatLearn/

2. Hands-On Guides & Tooling

ResourceFocusLanguage
Google ML Crash Course: ClusteringInteractive exercises that mirror the experimentation flow of the chapter demo.๐Ÿ‡ฌ๐Ÿ‡ง English
Stanford CS229: Unsupervised LearningWorked derivations plus Python pseudocode for implementing K-Means.๐Ÿ‡ฌ๐Ÿ‡ง English
Scikit-learn Tutorial โ€“ ClusteringCompanion notebook comparing K-Means with Spectral, Agglomerative, and DBSCAN clustering.๐Ÿ‡ฌ๐Ÿ‡ง English

3. Choosing the Number of Clusters

ResourceWhy it mattersAccess
scikit-learn: Selecting the number of clustersDemonstrates silhouette analysisโ€”the same heuristic emphasized in the notebook sidebar.https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
Wikipedia: Determining the number of clustersSummarizes elbow, gap statistic, and information criteria-based approaches.https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Elbow Method for optimal value of kStep-by-step tutorial for visualizing inertia curves in Python.https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/

4. Domain Applications

ResourceHighlightLanguage
NIH: Machine Learning in Cancer ResearchSurveys how clustering supports oncology diagnostics and treatment planning.๐Ÿ‡ฌ๐Ÿ‡ง English
Nature: Clustering for patient stratificationCase study on uncovering patient phenotypes via unsupervised pipelines.๐Ÿ‡ฌ๐Ÿ‡ง English
PubMed: K-means clustering in medical diagnosisLiterature review of clinical support tools powered by K-Means.๐Ÿ‡ฌ๐Ÿ‡ง English
WHO: Data analysis for healthMulti-language hub for public health analytics platforms that rely on clustering.๐ŸŒ Multi-language

Note: All links were re-checked in April 2024. For licensed materials, consult institutional libraries or open-access repositories.