Phase 2 — Classical ML

June 2, 2026

Continue the course 3 of the Machine Learning Specialization(Recommender Systems)

What I Did

Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm

Wrote the spec for the recommender algorithm for a side gig I got recently

What I Learned

Recommender system predicts how a user would rank or respond to items they haven't seen yet, then surface with the highest predicted score

There are two types of recommender systems:

Collaborative filtering: recommends item $i$ to user $j$ by looking at what other users with similar ratings to user $j$ thought of item $i$ . It uses matrix only, it does not look at who the users are or what the items are
Content-based filtering: recommends item $i$ to user $j$ based on features of user $j$ and features of item $i$ , asking whether they are a good match or not. It uses side information about users and items not just past ratings.

Collaborative filtering algorithm uses gradient descent or Adam on the cost function:

J(w, b, x) = \sum_{(i,j):r(i,j)=1}L(w, b, x) + \frac{\lambda}{2}\sum_{j=1}^{n_{user}}\sum_{k=1}^{n}\left(w_{k}^{(j)}\right)^2 + \frac{\lambda}{2}\sum_{i=1}^{n_{item}}\sum_{k=1}^{n}\left(x_{k}^{(i)}\right)^2

where:

\begin{aligned} n_{user} &\rightarrow number of users \\ n_{item} &\rightarrow number of items \\ n &\rightarrow number of features \\ r(i,j) &\rightarrow 1 if user j has rated item i, 0 otherwise \end{aligned}

The aim of collaborative filtering is to minimize $J$ w.r.t $w, b, x$

The loss $L$ can be mean squared error for linear regression problem or binary cross-entropy for logistic regression problem, which means we can use the logistic when we want to predict things like whether someone clicked or not, whether they watched an ad or not

We can implement collaborative filtering using Tensorflow auto-diff since the derivative will look a bit complex, will do manual one using Numpy online just to get a feel and understand things better

Tensorflow implementation is:

W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

for iter in range(iterations):
    with tf.GradientTape() as tape:
        cost = cofi_cost_func(X, W, b, Y, R, lambda_)
    grads = tape.gradient(cost, [X, W, b])
    optimizer.apply_gradients(zip(grads, [X, W, b]))

We can also find related items using the squared distance:

||x^{(k)} - x^{(i)}||^2 = \sum_{l=1}^n\left(x_l^{(k)} - x_l^{(i)}\right)^2

For every other item $k$ , we sort using ascending order then return the top few and that gives us the "more like this" recommendation(This works for collaborative and content-based filtering)

Limitations of collaborative filtering is the cold start problem(how do we rank new items that few users have rated and how do we show something reasonable to new users who have rated a few items) and how do we use information(item: genres, stars,... user: demographics, preferences,...)

Content-based filtering uses the item feature and user feature to make more relevant predictions, its model is:

\hat{y}^{(i,j)} = v_u^{(j)} \cdot v_m^{(i)}

where $v_u^{(j)}$ is a vector computed from $x_u^{(j)}$ and $v_m^{(i)}$ is a vector computed from $x_m^{(j)}$ ( $x_u^{(j)}$ and $x_m^{(i)}$ don't need to have the same dimension). Both v's must have the same dimension for the dot product to work and the way we do do this is that we pass the $x_u^{(j)}$ and $x_m^{(i)}$ into a neural network and the output of the neural network will be such that the v's are of equal dimension

To train this model, both networks are trained together as a combined model and the cost function is:

J(w, b, x) = \sum_{(i,j):r(i,j)=1}L(w, b, x) + \text{NN regularization term}

The tensorflow implementation is:

user_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

item_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

# Inputs
input_user = tf.keras.layers.Input(shape=(num_user_features,))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)

input_item = tf.keras.layers.Input(shape=(num_item_features,))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)

# Dot product
output = tf.keras.layers.Dot(axes=1)([vu, vm])

model = tf.keras.Model([input_user, input_item], output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01), loss=tf.keras.losses.MeanSquaredError())

When we have a large catalogue, we might have a bottleneck because we have to run this huge forward propagation through the two neural network to make inference for all the large catalogue involved, the way to fix this is to use a two-stage architecture where the first is Retrieval and this generate some plausible candidates using methods that are fast and approximate such as top movies in user region, ... and we can combine several of these retrieval methods to end up with about 100-1000 candidates and then we pass these into the mode and this is the second stage which is called Ranking

We are doing this because retrieval is cheap and inexact whereas ranking is expensive and accurate

Bugs & Blockers

N/A

Concepts That Need More Time

Still trying to wrap my head around the implementation of collaborative filtering and might need to implement from scratch without Tensorflow to get a better feel of what I'm doing

Still struggling a bit with the new auto-diff from Tensorflow also and how to use them, might need to explore some examples and projects so I can get a better understanding

Tomorrow

Continue the course 3 of the Machine Learning Specialization(The optional part which is PCA, I've done PCA before but it's still worth checking)

Will probably start the Reinforcement Learning part as well

Wins

Used recommender systems both collaborative filtering and content-based filtering for a movie recommendation algorithm

Wrote the spec for the recommender algorithm for a side gig I got recently

#ml