Skip to content
truthxify
← Journal

Phase 2 — Classical ML

June 2, 2026

Continue the course 3 of the Machine Learning Specialization(Recommender Systems)

What I Did

Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm

Wrote the spec for the recommender algorithm for a side gig I got recently

What I Learned

Recommender system predicts how a user would rank or respond to items they haven't seen yet, then surface with the highest predicted score

There are two types of recommender systems:

  • Collaborative filtering: recommends item ii to user jj by looking at what other users with similar ratings to user jj thought of item ii. It uses matrix only, it does not look at who the users are or what the items are
  • Content-based filtering: recommends item ii to user jj based on features of user jj and features of item ii, asking whether they are a good match or not. It uses side information about users and items not just past ratings.

Collaborative filtering algorithm uses gradient descent or Adam on the cost function:

J(w,b,x)=(i,j):r(i,j)=1L(w,b,x)+λ2j=1nuserk=1n(wk(j))2+λ2i=1nitemk=1n(xk(i))2J(w, b, x) = \sum_{(i,j):r(i,j)=1}L(w, b, x) + \frac{\lambda}{2}\sum_{j=1}^{n_{user}}\sum_{k=1}^{n}\left(w_{k}^{(j)}\right)^2 + \frac{\lambda}{2}\sum_{i=1}^{n_{item}}\sum_{k=1}^{n}\left(x_{k}^{(i)}\right)^2

where:

nusernumberofusersnitemnumberofitemsnnumberoffeaturesr(i,j)1ifuserjhasrateditemi,0otherwise\begin{aligned} n_{user} &\rightarrow number of users \\ n_{item} &\rightarrow number of items \\ n &\rightarrow number of features \\ r(i,j) &\rightarrow 1 if user j has rated item i, 0 otherwise \end{aligned}

The aim of collaborative filtering is to minimize JJ w.r.t w,b,xw, b, x

The loss LL can be mean squared error for linear regression problem or binary cross-entropy for logistic regression problem, which means we can use the logistic when we want to predict things like whether someone clicked or not, whether they watched an ad or not

We can implement collaborative filtering using Tensorflow auto-diff since the derivative will look a bit complex, will do manual one using Numpy online just to get a feel and understand things better

Tensorflow implementation is:

W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

for iter in range(iterations):
    with tf.GradientTape() as tape:
        cost = cofi_cost_func(X, W, b, Y, R, lambda_)
    grads = tape.gradient(cost, [X, W, b])
    optimizer.apply_gradients(zip(grads, [X, W, b]))

We can also find related items using the squared distance:

x(k)x(i)2=l=1n(xl(k)xl(i))2||x^{(k)} - x^{(i)}||^2 = \sum_{l=1}^n\left(x_l^{(k)} - x_l^{(i)}\right)^2

For every other item kk, we sort using ascending order then return the top few and that gives us the "more like this" recommendation(This works for collaborative and content-based filtering)

Limitations of collaborative filtering is the cold start problem(how do we rank new items that few users have rated and how do we show something reasonable to new users who have rated a few items) and how do we use information(item: genres, stars,... user: demographics, preferences,...)

Content-based filtering uses the item feature and user feature to make more relevant predictions, its model is:

y^(i,j)=vu(j)vm(i)\hat{y}^{(i,j)} = v_u^{(j)} \cdot v_m^{(i)}

where vu(j)v_u^{(j)} is a vector computed from xu(j)x_u^{(j)} and vm(i)v_m^{(i)} is a vector computed from xm(j)x_m^{(j)} (xu(j)x_u^{(j)} and xm(i)x_m^{(i)} don't need to have the same dimension). Both v's must have the same dimension for the dot product to work and the way we do do this is that we pass the xu(j)x_u^{(j)} and xm(i)x_m^{(i)} into a neural network and the output of the neural network will be such that the v's are of equal dimension

To train this model, both networks are trained together as a combined model and the cost function is:

J(w,b,x)=(i,j):r(i,j)=1L(w,b,x)+NN regularization termJ(w, b, x) = \sum_{(i,j):r(i,j)=1}L(w, b, x) + \text{NN regularization term}

The tensorflow implementation is:

user_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

item_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

# Inputs
input_user = tf.keras.layers.Input(shape=(num_user_features,))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)

input_item = tf.keras.layers.Input(shape=(num_item_features,))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)

# Dot product
output = tf.keras.layers.Dot(axes=1)([vu, vm])

model = tf.keras.Model([input_user, input_item], output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01), loss=tf.keras.losses.MeanSquaredError())

When we have a large catalogue, we might have a bottleneck because we have to run this huge forward propagation through the two neural network to make inference for all the large catalogue involved, the way to fix this is to use a two-stage architecture where the first is Retrieval and this generate some plausible candidates using methods that are fast and approximate such as top movies in user region, ... and we can combine several of these retrieval methods to end up with about 100-1000 candidates and then we pass these into the mode and this is the second stage which is called Ranking

We are doing this because retrieval is cheap and inexact whereas ranking is expensive and accurate

Bugs & Blockers

N/A

Concepts That Need More Time

Still trying to wrap my head around the implementation of collaborative filtering and might need to implement from scratch without Tensorflow to get a better feel of what I'm doing

Still struggling a bit with the new auto-diff from Tensorflow also and how to use them, might need to explore some examples and projects so I can get a better understanding

Tomorrow

Continue the course 3 of the Machine Learning Specialization(The optional part which is PCA, I've done PCA before but it's still worth checking)

Will probably start the Reinforcement Learning part as well

Wins

Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm

Wrote the spec for the recommender algorithm for a side gig I got recently