Recommendation engine starting point

Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.

This page presents some ideas of using topic modeling as a starting point for a recommendation engine.

Instead of documents and terms from the entire vocabulary one would have customers and products purchased out of the entire inventory.

The VBM (Vector Based Model) can use a SVD (Singular Value Decomposition) method.
The LSI (Latent Semantic Indexing) method is based on this model.

Document-Document similarity would then become customer-customer similarity.
Term-Term similarity would then become purchase-purchase similarity.
Document-Term similarity would become customer-purchase similarity.

To try/prototype this for a specific recommendation application, the following snapshot is needed.

List of all customers (i.e., documents) where only a unique id number is needed (to anonymize test data).

List of all products (i.e., words) where a unique product id and text string are needed (to qualitatively evaluate the results).

List of all purchased products by customer which is a list of customer id and product id pairs.

Dates and other information can initially be ignored.

It is important to know how many customers and how many products are involved as this effects efficiency/implementation issues.

Known customer: When a customer is visiting a page it is assumed they might want to purchase this product. If the customer is known, then prior purchase behavior is part of the matching (via similarity). Unknown customer: If the customer is unknown, then it is the unknown customer and just this product are matched (via similarity). In both cases, a customer-customer similarity (in the higher-dimension vector space of LSI) will enable determining the most common products purchased via this similarity. This is the "Customers like you purchased these products...".

The corresponding product-product similarity (in the higher-dimension vector space of LSI), using the product on this page, should provide the "Products similar to this one".

The LSI method allows a new customer and/or product to be matched against the pre-computed matrices of previous customer-purchase behavior.

The pre-computed similarity matrices can involve significant computation and might be done, for example, once a day, once a week, etc. These off-line similarity computations could be done with the high performance and robust Python code and linear algebra libraries that support multiple processors/machines. Once that is done, the customer-customer, product-product, customer-product similarity computations are much more quickly and easily done in real time.

Such real time computations can be done using, say, C#. To add this capability to a web page, the web page would call a (to-be-created) C# class with the customer id (or unknown id) and a product id.

The C# class would, though the appropriate data access and code actions, compute and return the info needed to populate that part of the web page.