๐ Bibliography: K-Means and Clustering
These references build on the storyline behind the K-Means Cluster Explorer demo by highlighting rigorous treatments of the algorithm, practical tooling for experimentation, heuristics to pick the right number of clusters, and real-world deployments in healthcare analytics.
Table of Contents
1. Algorithm Foundations
| Resource | Type | Notes | Access |
|---|---|---|---|
| scikit-learn: Clustering | Technical guide | Detailed comparison between K-Means variants, initialization schemes, and convergence criteria. | https://scikit-learn.org/stable/modules/clustering.html |
| Wikipedia: K-means clustering | Reference article | Historical context, Lloydโs algorithm, and common refinements. | https://en.wikipedia.org/wiki/K-means_clustering |
| Bishop, C. M. (2006). Pattern Recognition and Machine Learning | Textbook | Chapter 9 develops the derivation of K-Means from an expectationโmaximization perspective. | https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/ |
| Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning | Textbook | Section 14.3 contrasts K-Means with model-based clustering approaches. | https://hastie.su.domains/ElemStatLearn/ |
2. Hands-On Guides & Tooling
| Resource | Focus | Language |
|---|---|---|
| Google ML Crash Course: Clustering | Interactive exercises that mirror the experimentation flow of the chapter demo. | ๐ฌ๐ง English |
| Stanford CS229: Unsupervised Learning | Worked derivations plus Python pseudocode for implementing K-Means. | ๐ฌ๐ง English |
| Scikit-learn Tutorial โ Clustering | Companion notebook comparing K-Means with Spectral, Agglomerative, and DBSCAN clustering. | ๐ฌ๐ง English |
3. Choosing the Number of Clusters
| Resource | Why it matters | Access |
|---|---|---|
| scikit-learn: Selecting the number of clusters | Demonstrates silhouette analysisโthe same heuristic emphasized in the notebook sidebar. | https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html |
| Wikipedia: Determining the number of clusters | Summarizes elbow, gap statistic, and information criteria-based approaches. | https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set |
| Elbow Method for optimal value of k | Step-by-step tutorial for visualizing inertia curves in Python. | https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/ |
4. Domain Applications
| Resource | Highlight | Language |
|---|---|---|
| NIH: Machine Learning in Cancer Research | Surveys how clustering supports oncology diagnostics and treatment planning. | ๐ฌ๐ง English |
| Nature: Clustering for patient stratification | Case study on uncovering patient phenotypes via unsupervised pipelines. | ๐ฌ๐ง English |
| PubMed: K-means clustering in medical diagnosis | Literature review of clinical support tools powered by K-Means. | ๐ฌ๐ง English |
| WHO: Data analysis for health | Multi-language hub for public health analytics platforms that rely on clustering. | ๐ Multi-language |
Note: All links were re-checked in April 2024. For licensed materials, consult institutional libraries or open-access repositories.