Movie recommendation system in python

Introduction:

making a movie recommendation system in python is a lot easier than you think. In this tutorial I will be showing you how to create a movie recommendation system in python and we will use the data we scraped in this tutorial. Instead of boring you with the concepts of a movie recommendation system, to find out how recommendation systems click here

Getting started:

like I said earlier, we will be using the movie data we scraped earlier in this tutorial and we will be needing the following modules:

pip3 install pandas
pip3 install numpy
pip3 install sklearn

our movie recommendation system is content based, basically we recommend a movie to a user based on movies similar to the movie we know. Our movie recommendation system will use cosine similarity to recommend movies to a user,the models finds movie most similar to a given movie based on the movies genre,rating and views.

The algorithm:

Our movie recommendation model is a content based recommender which means it has to find the similarity between movies.

the algorithm is pretty simple:

given a movie A, find all items in S similar to A:

The movie recommendation system in Python:

let’s kick things off by importing the required modules:

import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

let’s open our movie dataset and get it ready for processing:

path = './storage/'
cv = CountVectorizer()
df = pd.read_csv(path+"Tvseries_Dataset.csv")

we will need two helper functions, get_title_from_index and get_index_from_title.

get_index_from_title retrieves the index of a movie when given the name while get_title_from_index retrieves movie name when given the index.

def get_title_from_index(index):
    return df[df.index == index]["movie_name"].values[0]

def get_index_from_title(title):
    return df[df.movie_name == title]["index"].values[0]

moving forward, we will need to combine the features we need to make recommendations from our model.

def combine_features(row):
        try:
            return(row["genres"]+" "+str(row["views"])+" "+str(row["rating"])+" "+row['casts'])
        except Exception as e:
            return("Error:", e)

the function above joins the features we want, which includes; genres, views, rating and casts.

The model:

first we combine the features we want and add the combined features to our pandas dataframe

df["combined_features"] = df.apply(combine_features,axis=1)

second we transform our combined features into a matrix so that we can find the cosine similarity.

count_matrix = cv.fit_transform(df["combined_features"])
cosine_sim = cosine_similarity(count_matrix)

next we find the index of the movie we want to find similarity to and find the movies similar to it

movie_index = get_index_from_title(movie_name)
similar_movies =  list(enumerate(cosine_sim[movie_index]))

we need to sort the similar_movies list to find the most ranking or most similar movies in the list.

sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)

the final step, we convert the movie indexes to their actual name:

listitems = []
for element in sorted_similar_movies:
	listitems.append(get_title_from_index(element[0]))

putting it all together:

import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

cv = CountVectorizer()
df = pd.read_csv("Tvseries_Dataset.csv")

def get_title_from_index(index):
    return df[df.index == index]["movie_name"].values[0]

def get_index_from_title(title):
    return df[df.movie_name == title]["index"].values[0]

def combine_features(row):
        try:
            return(row["genres"]+" "+str(row["views"])+" "+str(row["rating"])+" "+row['casts'])
        except Exception as e:
            return("Error:", e)


def recommend(movie_name):
        listitems = []
        df["combined_features"] = df.apply(combine_features,axis=1)
        count_matrix = cv.fit_transform(df["combined_features"])
        cosine_sim = cosine_similarity(count_matrix)
        movie_index = get_index_from_title(movie_name)
        similar_movies =  list(enumerate(cosine_sim[movie_index]))
        sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)
        for element in sorted_similar_movies:
            listitems.append(get_title_from_index(element[0]))
        return(listitems[1:])

our recommender system in action:

let’s find the top 5 most similar movies to Wynonna Earp based on the data from our dataset:

item  = recommend("Wynonna Earp")
for i in item[:5]:print(i) 
Movie recommendation system in Python

not bad huh, though there is room for improvement also you can find the full code at my github repo. If you have have any questions please use the comment section I promise to get back to you ASAP.

Leave a Reply

Your email address will not be published. Required fields are marked *