Re: optimizing a cpu-heavy query

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: optimizing a cpu-heavy query
Дата	26 апреля 2011 г. 16:00:59
Msg-id	8038.1303833650@sss.pgh.pa.us обсуждение исходный текст
Ответ на	optimizing a cpu-heavy query (Joel Reymont <joelr1@gmail.com>)
Ответы	Re: optimizing a cpu-heavy query (Joel Reymont <joelr1@gmail.com>)
Список	pgsql-general

Дерево обсуждения

Joel Reymont <joelr1@gmail.com> writes:
> I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function
operateson vectors of 150 floats.  

> CREATE OR REPLACE FUNCTION docs_within_distance(vec topics, threshold float)
> RETURNS TABLE(id doc_id, distance float) AS $$
> BEGIN
>     RETURN QUERY
>     SELECT *
>     FROM (SELECT doc_id, (SELECT sum(vec[i] * ln(vec[i] / topics[i]))
>                       FROM generate_subscripts(topics, 1) AS i
>                       WHERE topics[i] > 0) AS distance
>           FROM docs) AS tab
>     WHERE tab.distance <= threshold;
> END;
> $$ LANGUAGE plpgsql;

Yikes.  That sub-select is a mighty expensive way to compute the scalar
product.  Push it into a sub-function that takes the two arrays and
iterates over them with a for-loop.  For another couple orders of
magnitude, convert the sub-function to C code.  (I don't think you need
a whole data type, just a function that does the scalar product.)

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее

От: Scott Marlowe
Дата: 26 апреля 2011 г., 15:53:22
Сообщение: Re: Recover database from binary files

Следующее

От: Michael Nolan
Дата: 26 апреля 2011 г., 16:26:12
Сообщение: Re: SSDs with Postgresql?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: optimizing a cpu-heavy query

Предыдущее

Следующее