📚 Bibliography: K-Means and Clustering

These references build on the storyline behind the K-Means Cluster Explorer demo by highlighting rigorous treatments of the algorithm, practical tooling for experimentation, heuristics to pick the right number of clusters, and real-world deployments in healthcare analytics.

1. Algorithm Foundations

Resource	Type	Notes	Access
scikit-learn: Clustering	Technical guide	Detailed comparison between K-Means variants, initialization schemes, and convergence criteria.	https://scikit-learn.org/stable/modules/clustering.html
Wikipedia: K-means clustering	Reference article	Historical context, Lloyd’s algorithm, and common refinements.	https://en.wikipedia.org/wiki/K-means_clustering
Bishop, C. M. (2006). Pattern Recognition and Machine Learning	Textbook	Chapter 9 develops the derivation of K-Means from an expectation–maximization perspective.	https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning	Textbook	Section 14.3 contrasts K-Means with model-based clustering approaches.	https://hastie.su.domains/ElemStatLearn/

2. Hands-On Guides & Tooling

Resource	Focus	Language
Google ML Crash Course: Clustering	Interactive exercises that mirror the experimentation flow of the chapter demo.	🇬🇧 English
Stanford CS229: Unsupervised Learning	Worked derivations plus Python pseudocode for implementing K-Means.	🇬🇧 English
Scikit-learn Tutorial – Clustering	Companion notebook comparing K-Means with Spectral, Agglomerative, and DBSCAN clustering.	🇬🇧 English

3. Choosing the Number of Clusters

Resource	Why it matters	Access
scikit-learn: Selecting the number of clusters	Demonstrates silhouette analysis—the same heuristic emphasized in the notebook sidebar.	https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
Wikipedia: Determining the number of clusters	Summarizes elbow, gap statistic, and information criteria-based approaches.	https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Elbow Method for optimal value of k	Step-by-step tutorial for visualizing inertia curves in Python.	https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/

4. Domain Applications

Resource	Highlight	Language
NIH: Machine Learning in Cancer Research	Surveys how clustering supports oncology diagnostics and treatment planning.	🇬🇧 English
Nature: Clustering for patient stratification	Case study on uncovering patient phenotypes via unsupervised pipelines.	🇬🇧 English
PubMed: K-means clustering in medical diagnosis	Literature review of clinical support tools powered by K-Means.	🇬🇧 English
WHO: Data analysis for health	Multi-language hub for public health analytics platforms that rely on clustering.	🌐 Multi-language

Note: All links were re-checked in April 2024. For licensed materials, consult institutional libraries or open-access repositories.

📚 Bibliography: K-Means and Clustering

Table of Contents

1. Algorithm Foundations

2. Hands-On Guides & Tooling

3. Choosing the Number of Clusters

4. Domain Applications