photo credit: pixabay

Building A Recommender System With Implicit Feedback Datasets Using Alternating Least Squares

RecSys, ALS, Collaborative Filtering

--

In real-world scenarios most feedback is not explicit but implicit. Implicit feedback is tracked automatically, like monitoring clicks, view times, purchases, etc. Thus it is much easier to collect. Think of yourself, how often you give a rating after purchasing a product vs. how often you click or purchase a product. In fact implicit feedback is already available in almost every information system — e.g. web servers record any page access in log files.

In this post, we will focus on item recommendation. The task of item recommendation is to create a user-specific ranking for a set of items. Preferences of users about items are learned from the user’s past interaction with the system — e.g. his (or her) online purchasing history in this case. That is, building a recommender system that provides personalized recommendations to customers based on their purchasing history.

The Data

The data we are using today is online retail dataset from UCI machine learning repository. And we will be using Implicit Library, a Fast Python Collaborative Filtering for Implicit Datasets, for our matrix factorization.

online_retail_data.py
Figure 1
  • There are a lot of “CustomerID” were missing from the data, so we will have to remove those rows.
  • Group “CustomerID” and “StockCode” then sum the “Quantity”. So that we get each customer and each item interactions.
  • If “Quantity” = 0, we change to one.
  • Eliminate negative “Quantity”.
clean_retail_data.py

Here is our grouped data.

Table 1
quantity_hist.py
Figure 2

The vast majority of customers purchased less than 40 pieces of same item in one interaction, very few of them purchased more than 2,000 pieces of same item in one interaction.

retail_count.py
Figure 3

Implicit Feedback

Instead of representing an explicit rating, the “Quantity” can represent a “confidence” in terms of how strong the interaction was. Items with a larger number of “Quantity” by a customer can carry more weight in our ratings matrix of “Quantity”.

  • We will create numeric “customer_id and “item_id” columns.
  • Create two matrices, one for fitting the model (item-customer) and another one for recommendations (customer-item).
  • Initialize the Alternating Least Squares (ALS) recommendation model.
  • Fit the model using the sparse item-customer matrix.
  • We set the type of our matrix to double for the ALS function to run properly.
online_retail_ALS.py

Example of Recommendation — Finding the Similar Items

Let’s start with “WHITE METAL LANTERN”. We found that “item_id” for “WHITE METAL LANTERN” is 1319.

grouped_df.loc[grouped_df['item_id'] == 1319].head()
Table 2

Finding the 10 most similar items to “WHITE METAL LANTERN”.

  • Get the customer and item vectors from our trained model.
  • Calculate the vector norms.
  • Calculate the similarity score.
  • Get the top 10 items.
  • Create a list of item-score tuples of most similar items with this item.
similar_item.py
Figure 4

The first item is always itself. I will let you judge whether the rest 9 items are some what similar with the first one.

Example of Recommendation — Recommend Items to Customers

The following function will return the top 10 recommendations chosen based on the customer/item vectors for items never purchased for any given customer.

  • Get the purchase score from the sparse customer item matrix.
  • Add 1 to everything, so that items with no purchase yet become equal to 1.
  • Make items already purchased zero.
  • Get dot product of customer vector and all item vectors.
  • Scale this recommendation vector between 0 and 1.
  • Items already purchased have their recommendation multiplied by zero.
  • Sort the indices of the item into order of best recommendations.
  • Start empty list to store descriptions and scores.
  • Append descriptions and scores to the list.
  • Get the trained customer and item vectors. We convert them to csr matrices.
  • Create recommendations for customer with id 2.
RecSys_ALS_items.py
Figure 5

Now we have top 10 recommendations for customer_id 2. Do they make sense? Let’s get top 20 items this customer has purchased.

top20_purchase.py
Figure 6

This customer’s top purchases were lip glosses, designed tissues and holiday cake cases, etc things like people purchase when hosting holiday parties. The items we recommended to him (her) includes fruit straws, gift boxes, cocktail parasols, etc. Those are also things people purchase when hosting a party. Remember “Customers who bought this item also bought…”?

The best evaluation metrics for a recommender system is how much the system adds value to the customers and/or business, whether the system increase sales and profits. We will want to do some kind online A/B testing to evaluate these metrics.

However, there are other common metrics for evaluating the performance of a recommender in isolation. By following this tutorial, we were able to calculate the AUC for each customer in our training set that had at least one item purchased. And AUC for the most popular items for the customers to compare.

Jupyter notebook can be found on Github. Enjoy the rest of the weekend!

Reference: https://towardsdatascience.com/building-a-collaborative-filtering-recommender-system-with-clickstream-data-dffc86c8c65

--

--

Susan Li

Changing the world, one post at a time. Sr Data Scientist, Toronto Canada. https://www.linkedin.com/in/susanli/