“How’s that movie?” — Neural collaborative filtering with FastAI

Build a state-of-the-art movie recommendation system with just 10 lines of code

Source: Unsplash and 3Blue1Brown

The MovieLens 100K Dataset

The MovieLens 100K dataset is a collection of movie ratings by 943 users on 1682 movies. There are 100,000 ratings in total, since not every user has seen and rated every movie. Here are some sample ratings from the dataset:

System Setup

If you want to follow along as we build this model, a fully reproducible Jupyter notebook for this tutorial can be found hosted on Jovian:

pip install jovian --upgrade     # Install the jovian library 
jovian clone a1b40b04f5174a18bd05b17e3dffb0f0 # Download notebook
cd movielens-fastai # Enter the created directory
jovian install # Install the dependencies
conda activate movielens-fastai # Activate virtual environment
jupyter notebook # Start Jupyter

Preparing the data

You can download the MovieLens 100K dataset from this link. Once downloaded, unzip and extract the data into a directory ml-100k next to the Jupyter notebook. As described in the README, the file u.data contains the list of ratings.

  1. It splits the data into a training set and a validation set
  2. Creates data loaders to access the data in batches
  3. Checks if a GPU is available, and moves the data to the GPU

Neural collaborative filtering model

The model itself is quite simple. We represent each user u and each movie m by vector of a predefined length n. The rating for the movie m by the user u, as predicted by the model is simply the dot product of the two vectors.

Source: FastAI Lesson 4
Source: Wikipedia

Training the model

The learner uses the mean squared error loss function to evaluate the predictions of the model, and the Adam optimizer to adjust the parameters (vectors and biases) using gradient descent. Before we train the model, we use the learning rate finder to select a good learning for the optimizer.

Looking at some predictions

While it’s great to see the loss go down, let’s look at some actual predictions of the model.

Source: Netflix

Save and commit

As a final step, we can save and commit our work using the jovian library.

Further Reading

In a future post, we’ll dive deeper and see how DataBunch and collab_learner are actually implemented, using PyTorch. We'll also explore how we can interpret the vectors and biases learned by the model, and see some interesting results.

  • Paper introducing neural collaborative filtering
  • PyTorch: Zero to GANs — a tutorial series covering the basics of PyTorch and neural networks

Founder, Jovian