photo credit: pixabay

Building A Recommender System With Implicit Feedback Datasets Using Alternating Least Squares

RecSys, ALS, Collaborative Filtering

The Data

The data we are using today is online retail dataset from UCI machine learning repository. And we will be using Implicit Library, a Fast Python Collaborative Filtering for Implicit Datasets, for our matrix factorization.

online_retail_data.py
Figure 1
  • There are a lot of “CustomerID” were missing from the data, so we will have to remove those rows.
  • Group “CustomerID” and “StockCode” then sum the “Quantity”. So that we get each customer and each item interactions.
  • If “Quantity” = 0, we change to one.
  • Eliminate negative “Quantity”.
clean_retail_data.py
Table 1
quantity_hist.py
Figure 2
retail_count.py
Figure 3

Implicit Feedback

Instead of representing an explicit rating, the “Quantity” can represent a “confidence” in terms of how strong the interaction was. Items with a larger number of “Quantity” by a customer can carry more weight in our ratings matrix of “Quantity”.

  • We will create numeric “customer_id and “item_id” columns.
  • Create two matrices, one for fitting the model (item-customer) and another one for recommendations (customer-item).
  • Initialize the Alternating Least Squares (ALS) recommendation model.
  • Fit the model using the sparse item-customer matrix.
  • We set the type of our matrix to double for the ALS function to run properly.
online_retail_ALS.py

Example of Recommendation — Finding the Similar Items

Let’s start with “WHITE METAL LANTERN”. We found that “item_id” for “WHITE METAL LANTERN” is 1319.

grouped_df.loc[grouped_df['item_id'] == 1319].head()
Table 2
  • Get the customer and item vectors from our trained model.
  • Calculate the vector norms.
  • Calculate the similarity score.
  • Get the top 10 items.
  • Create a list of item-score tuples of most similar items with this item.
similar_item.py
Figure 4

Example of Recommendation — Recommend Items to Customers

The following function will return the top 10 recommendations chosen based on the customer/item vectors for items never purchased for any given customer.

  • Get the purchase score from the sparse customer item matrix.
  • Add 1 to everything, so that items with no purchase yet become equal to 1.
  • Make items already purchased zero.
  • Get dot product of customer vector and all item vectors.
  • Scale this recommendation vector between 0 and 1.
  • Items already purchased have their recommendation multiplied by zero.
  • Sort the indices of the item into order of best recommendations.
  • Start empty list to store descriptions and scores.
  • Append descriptions and scores to the list.
  • Get the trained customer and item vectors. We convert them to csr matrices.
  • Create recommendations for customer with id 2.
RecSys_ALS_items.py
Figure 5
top20_purchase.py
Figure 6

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Susan Li

Susan Li

26K Followers

Changing the world, one post at a time. Sr Data Scientist, Toronto Canada. https://www.linkedin.com/in/susanli/