Restaurant Recommender System

Catherine Gitau
8 min readJan 26, 2021
Photo by Mgg Vitchakorn on Unsplash

About 6 months ago, after a rigorous selection process, I was selected among other 15 participants to join the Africa Data Science Intensive(DSI) program for 2020. This is a hands-on-skills training program in data science that is based on solving real-world problems. During the program, I had the opportunity to work on several projects which covered topics like Linear Regression, Logistic Regression, Natural Language Processing, Convolution Neural Networks, just to name a few.

Some of the projects I got to work on are:

looking back, this training has given me a strong understanding of the whole data science process from writing algorithms to deploying models to production and also getting familiarized with the tools needed for making this possible. We have now come to the end of the program and just concluding on our final and capstone projects so as to showcase all that we have learned during the course of the training. This blog post will be explaining the project that I decided to work on and the end-to-end process that I went through to build and showcase my models.

Restaurant Recommendation System

For my final project, I wanted to work on a project I have never worked on before, so as to get familiarized with that area of data science as well as build something that might be useful to a particular user at some point. This lead me to build a recommendation system, and because of my love for food, I decided to work on one that recommends people restaurants near their location based on various factors such as restaurant reviews from other people, by getting sentiments from the reviews that would improve the recommendation suggestions as well as give insights to the user on which restaurant they would enjoy eating from most.

Recommender systems are among the most popular applications of data science at the moment. They are used to predict “rating” or “preference” that a user would give to an item. Some companies that are actively using recommendation systems are; Amazon, Youtube, Netflix, Facebook e.t.c

Image from Yelp Website

For this project, the data that I decided to use was taken from the Yelp data set which is a subset of their businesses, reviews and user data which was made publicly available to be used in personal, educational and academic purposes. Since the data was too large, I decided to narrow down to restaurants in the city of Toronto because it had the most number of reviews as compared to the other 9 cities in the data set. After filtering out the data, I ended up with a total of 5,471 Businesses, 44,485 Users and 23,050 reviews.

I went through various processes which I will take you through such as:

  • Exploratory Data Analysis
  • Sentiment Analysis
  • Topic Modelling
  • Recommendation system models

Exploratory Data Analysis

Exploratory data analysis (EDA) is an approach to analyzing data sets so as to summarize their main characteristics. I performed EDA on the review, business as well as user data.

Image showing restaurants in Toronto city

The above diagram was built using the plotly package in python. It shows the geographical view of the restaurants that are in the city of Toronto. For the purpose of this project, I restricted the categories that businesses can contain in our filtered datasets and restricted those categories to Restaurants, Fast Food, Breakfast and Brunch, Cafes. The diagram below shows the distribution of the categories of businesses that I filtered out.

From the above it seems like majority of the restaurants in Toronto are nightlife and bars. The diagram below shows the distribution of restaurant ratings from yelp users.

Restaurant ratings distribution

From the above it seems like most of the ratings/stars are 4 or 5 scores.

After exploring the data, I had rough idea about the data that I was working on. The next step was to perform some sentiment analysis to get sentiments from the review data.

Sentiment Analysis

Sentiment analysis is technique in natural language processing technique used to determine whether a particular text is positive, negative or neutral. It is mostly used to help businesses monitor brand and product sentiment in customer feedback so as to understand their needs.

I made of use of the text reviews which were given by users for every restaurant they visited and got a score that was able to indicate whether the text was positive or negative. I used Textblob’s Polarity Score. Textblob is a python library that offers a simple API to access its methods and perform basic NLP tasks. The sentiment function of textblob returns two properties polarity and subjectivity. Polarity is a float that lies in the range of [-1, 1] where 1 means positive statement and -1 means negative statement. I combined the star rating of the restaurant together with the sentiment polarity to get an overall score to represent the customers’ experiences. This score will later be used to build recommendation models.

Negative Yelp Restaurant reviews

Topic Modelling

Topic modelling is a type of statistical model for discovering “topics” that occur within documents of text. It is frequently used for discovery of hidden semantic structures that are in text. This was another key feature that I could extract from the reviews data. I used the Latent Dirichlet Allocation(LDA) a topic modelling method to classify text in a document to a particular topic.

Topic from reviews

LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution. Given a dataset of documents, LDA backtracks and tries to figure out what topics would create those documents in the first place. You can read more about how LDA works here. Using LDA, I was able to create a few dominant topics that gave interesting insights into the topics people were talking about in their Yelp Reviews. This data will be useful while building on of the recommendation models which I will mention below.

Recommendation System Models

For this capstone project, I ended up building 3 recommendation models:

  • Location-based recommendation system
  • Content-based recommendation system
  • Collaborative filtering recommendation system

Location-based recommendation system

This is a simple recommendation model that gives new users restaurant recommendations through the locations they're in, thus suggesting restaurants that are around them. Knowing a person’s location at a particular point of time where they’re are trying to figure out which new restaurant they should visit, we could use a Location-based recommendation system to recommend restaurants near their location. One way we can group restaurants together based on geographical location is by the use of the K-Means Clustering Algorithm. This algorithm predicts the cluster where the user is located in and pulls out the cluster’s top 10 restaurants and recommends them to the user.

Restaurants in Toronto grouped into 10 clusters

Content-based recommendation system

Content-based recommenders are used to suggest similar items based on a particular item. This system uses data like description, genre, type of restaurant etc to make these recommendations. The general idea behind these kind of recommendation systems is that if a person likes a particular item, he or she will also like an item that is similar to it. In this case if someone had previously visited a Chinese restaurant and liked it, the recommendation system will then suggest other good Chinese restaurants that this person might like to visit.

This recommendation system was built based on similar restaurant categories and dominant topic keywords which then suggests restaurants that align with a user’s preferences.

For my content based recommendation system, I took the following steps:

  1. Creating a Bag of Words for each restaurant review. — This bag of words basically extracted features/text from the reviews of each restaurant
  2. Count Vectorizer. — The count vectorizer was used to convert the bag of words into a vector of token counts which enabled us to then perform cosine similarity.
  3. Creating a Cosine similarity Matrix. — The cosine similarity matrix is used to calculate the numeric quantity that denotes the similarity between the restaurants.
Diagram showing content-based recommender system in action

Collaborative filtering recommendation system

Collaborative filtering systems try to predict the rating or preference that a user would give and item based on past ratings and preferences of other users. For example if Cate, John have a similar interest in some restaurants. If let’s say a new restaurant comes up and Cate has liked it, then there’s a high chance that John will like it too and therefore, the system recommends this restaurant to John as well.similar interests

For my collaborative-filtering recommendation model, it works by searching a large group of people and finding users with similar interests to a particular user. The recommendation system then looks at the restaurants they like and combines them to create a ranked list of suggested restaurants. I took the following steps:

  1. Pivot table on scores to create a user-item Matrix
  2. Truncated Singular Value Decomposition
  3. Creating an item-item based matrix based on cosine similarity

Conclusion

In this post I have showed you how I went about creating three different recommendation models. i.e Location-based, content-based and collaborative filtering. I built a simple Graphical User Interface(GUI) to explore these different models and see how they would work in real life. You can find code to the notebooks as well as the code for building the GUI on my github page here.

References

--

--