Phase 2 — Classical ML
June 2, 2026
Continue the course 3 of the Machine Learning Specialization(Recommender Systems)
What I Did
Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm
Wrote the spec for the recommender algorithm for a side gig I got recently
What I Learned
Recommender system predicts how a user would rank or respond to items they haven't seen yet, then surface with the highest predicted score
There are two types of recommender systems:
- Collaborative filtering: recommends item to user by looking at what other users with similar ratings to user thought of item . It uses matrix only, it does not look at who the users are or what the items are
- Content-based filtering: recommends item to user based on features of user and features of item , asking whether they are a good match or not. It uses side information about users and items not just past ratings.
Collaborative filtering algorithm uses gradient descent or Adam on the cost function:
where:
The aim of collaborative filtering is to minimize w.r.t
The loss can be mean squared error for linear regression problem or binary cross-entropy for logistic regression problem, which means we can use the logistic when we want to predict things like whether someone clicked or not, whether they watched an ad or not
We can implement collaborative filtering using Tensorflow auto-diff since the derivative will look a bit complex, will do manual one using Numpy online just to get a feel and understand things better
Tensorflow implementation is:
W = tf.Variable(tf.random.normal((num_users, num_features),dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)
for iter in range(iterations):
with tf.GradientTape() as tape:
cost = cofi_cost_func(X, W, b, Y, R, lambda_)
grads = tape.gradient(cost, [X, W, b])
optimizer.apply_gradients(zip(grads, [X, W, b]))We can also find related items using the squared distance:
For every other item , we sort using ascending order then return the top few and that gives us the "more like this" recommendation(This works for collaborative and content-based filtering)
Limitations of collaborative filtering is the cold start problem(how do we rank new items that few users have rated and how do we show something reasonable to new users who have rated a few items) and how do we use information(item: genres, stars,... user: demographics, preferences,...)
Content-based filtering uses the item feature and user feature to make more relevant predictions, its model is:
where is a vector computed from and is a vector computed from ( and don't need to have the same dimension). Both v's must have the same dimension for the dot product to work and the way we do do this is that we pass the and into a neural network and the output of the neural network will be such that the v's are of equal dimension
To train this model, both networks are trained together as a combined model and the cost function is:
The tensorflow implementation is:
user_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32)
])
item_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32)
])
# Inputs
input_user = tf.keras.layers.Input(shape=(num_user_features,))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)
input_item = tf.keras.layers.Input(shape=(num_item_features,))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)
# Dot product
output = tf.keras.layers.Dot(axes=1)([vu, vm])
model = tf.keras.Model([input_user, input_item], output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01), loss=tf.keras.losses.MeanSquaredError())When we have a large catalogue, we might have a bottleneck because we have to run this huge forward propagation through the two neural network to make inference for all the large catalogue involved, the way to fix this is to use a two-stage architecture where the first is Retrieval and this generate some plausible candidates using methods that are fast and approximate such as top movies in user region, ... and we can combine several of these retrieval methods to end up with about 100-1000 candidates and then we pass these into the mode and this is the second stage which is called Ranking
We are doing this because retrieval is cheap and inexact whereas ranking is expensive and accurate
Bugs & Blockers
N/A
Concepts That Need More Time
Still trying to wrap my head around the implementation of collaborative filtering and might need to implement from scratch without Tensorflow to get a better feel of what I'm doing
Still struggling a bit with the new auto-diff from Tensorflow also and how to use them, might need to explore some examples and projects so I can get a better understanding
Tomorrow
Continue the course 3 of the Machine Learning Specialization(The optional part which is PCA, I've done PCA before but it's still worth checking)
Will probably start the Reinforcement Learning part as well
Wins
Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm
Wrote the spec for the recommender algorithm for a side gig I got recently