Collaborative Filtering – Netflix movie reviews


This project consists of a collaborative filtering algorithm to predict movie reviews ratings from a dataset of Netflix ratings.


The dataset corresponds to a subset of the original movie ratings data from the Netflix Prize Each row in the txt represents an observation with three fields: Movie ID, Customer ID, Rating.


The similarity measure used is Pearson coefficient. Given users i and j, let Ii be the set of movies that user i has rated
where is the average rating for user i

Let Rik be the rating of user i on movie k, the prediction is generated according to the following formula


To run the code and build the predictions, simply type in the terminal

python --train data/training.txt --test data/testing.txt


  • Yannet Interian – Advance Machine Learning – Course notes
  • Jure Leskovec, Anand Rajaraman and Jeffrey D. Ullman – Mining of Massive Datasets


View Github