Phase 2 — Classical ML
May 26, 2026
Start the course 3 of the Machine Learning Specialization
What I Did
Implemented a k-means algorithm from scratch and applied it to image compression
What I Learned
Clustering looks at a dataset and tries to find points that are similar to each other and group them together
This can be applied in a bunch of places like astronomy for grouping astronomical objects together, market segmentation, DNA microarray data
The way it works is that we choose some points at random and we call these points centroids, then assign each point in the training example to the closest cluster centroid. Then we move each centroid to the average of the points assigned to it. We do this recursively till we get the best clusters
The optimization objective is to minimizing a specific cost function called the distortion function:
The distortion function is the average squared distance from each training example to the centroid it has been assigned to
This does two things:
- The assignment step minimizes over with the held fixed. Closest centroid is the best choice
- The move step minimizes over with held fixed. The mean is the point that minimizes the sum of the squared distance to a set of points
should always go down or stay flat(converged), it should never increase. if goes up anytime, that means there is a bug in the implementation
The way we initialize the centroids is:
- Choose where m is the number of training examples
- Randomly pick distinct training examples
- Set equal to those examples
We run this multiple times about 50-1000(anything past that gives diminishing returns) and compute for each runs and pick the one with the smallest
The way we pick the number of clusters depends on us, we can pick the one that best serve the purpose of what we want to do
Bugs & Blockers
N/A
Concepts That Need More Time
How we are able to use for image compression, it seems a bit odd to me at first but i figured it out, I'm currently thinking about more things it can be applied to and would need some time to find a natural fit for it.
Tomorrow
Continue the course 3 of the Machine Learning Specialization
Wins
Implemented K-Means algorithm and applied it to image compression Google Colab