Re: people who buy A, also buy C, D, E

Поиск
Список
Период
Сортировка
От PFC
Тема Re: people who buy A, also buy C, D, E
Дата
Msg-id op.sszjw2thth1vuj@localhost
обсуждение исходный текст
Ответ на Re: people who buy A, also buy C, D, E  (Jan Wieck <JanWieck@Yahoo.com>)
Ответы Re: people who buy A, also buy C, D, E
Список pgsql-sql

>>> The goal of my query is: given a book, what did other people who
>>> bought this book also buy?  I plan the list the 5 most popular such
>>> books.
You can use the table listing ordered products directly, for example :

table ordered_products:    order_id,  product_id,  quantity

SELECT b.product_id, sum(quantity) as rank FROM ordered_products a,  
ordered_products b WHERE a.product_id=(the product id) AND  
b.order_id=a.order_id AND b.product_id != a.product_id GROUP BY  
b.product_id ORDER BY rank DESC LIMIT 6;

This will need indexes on order_id and product_id that you probably  
already have.
It will also be slow.
You can also have a cache table :

cache    prod_id_a, prod_id_b, quantity
With a constraint that prod_id_a < prod_id_b

You add a trigger on insert, update or delete to ordered_products to  
insert or update rows in this table, modifying the quantity according to  
the purchase.

To select you do :

SELECT * FROM
(
(SELECT prod_id_b as pid, quantity FROM cache WHERE prod_id_a=(your id)  
ORDER BY prod_id_a DESC, quantity DESC LIMIT 5)
UNION ALL
(SELECT prod_id_a as pid, quantity FROM cache WHERE prod_id_b=(your id)  
ORDER BY prod_id_b DESC, quantity DESC LIMIT 5)
) as foo
ORDER BY quantity DESC
LIMIT 5;

It will be probably very fast but the table will grow huge and need  
various indexes :
(prod_id_a, quantity)
(prod_id_b quantity)
(prod_id_a, prod_id_b)    (the primary key)

You'll get 1/2 * N * (N-1) rows, N being the number of products on your  
site. If you remove the constraint  prod_id_a < prod_id_b you'll get N^2 rows which is worse.
Another solution :

Table cache : product_id integer, also_purchased integer[]

After every order, update also_purchased with the results of the query  
using the self join on ordered_products tables above.
This query should not be fast enough to use in a product webpage but it  
shouldn't be slow enough to be used like thi, only when orders are made.

To get the "also purchased products" all you have to do is read a line in  
this table.














В списке pgsql-sql по дате отправления:

Предыдущее
От: "Jim Buttafuoco"
Дата:
Сообщение: Re: multiple PK with a non UNIQUE field
Следующее
От: PFC
Дата:
Сообщение: assorted problems with intarray and other GiST contribs.