Re: Slow query: table iteration (8.3)

Поиск
Список
Период
Сортировка
От Glenn Maynard
Тема Re: Slow query: table iteration (8.3)
Дата
Msg-id bd36f99e1002041904w37cac39jeb9d60156a06c151@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Slow query: table iteration (8.3)  (Glenn Maynard <glenn@zewt.org>)
Ответы Re: Slow query: table iteration (8.3)  (Yeb Havinga <yebhavinga@gmail.com>)
Re: Slow query: table iteration (8.3)  (Yeb Havinga <yebhavinga@gmail.com>)
Список pgsql-performance
On Thu, Feb 4, 2010 at 4:09 AM, Glenn Maynard <glenn@zewt.org> wrote:
> But I'll be dropping this db into 8.4 soon to see if it helps
> anything, and I'll check again (and if it's still slow I'll post more
> details).  It's been a while and I might just have been doing
> something wrong.

Windowing doesn't want to scale for this case.  I'll simplify to give
an actual test case.

create table test_users (id serial primary key);
insert into test_users (id) select generate_series(1, 1000);
create table test (id serial primary key, score integer, user_id integer);
insert into test (user_id, score) select s.id, random() * 1000000 from
(select generate_series(1, 1000) as id) as s, generate_series(1,
1000);
create index test_1 on test (score);
create index test_2 on test (user_id, score desc);
analyze;

This generates a thousand users, with a thousand scores each.  This
finds the top score for each user (ignoring the detail of duplicate
scores; easy to deal with):

SELECT sub.id FROM (
    SELECT t.id, rank() OVER (PARTITION BY t.user_id ORDER BY score
DESC) AS rank
    FROM test t
) AS sub WHERE rank <= 1;

This does use the test_2 index (as intended), but it's still very
slow: 11 seconds on my system.

It seems like it's doing a *complete* scan of the index, generating
ranks for every row, and then filters out all but the first of each
rank.  That means it scales linearly with the total number of rows.
All it really needs to do is jump to each user in the index and pull
out the first entry (according to the "score desc" part of the test_2
index), which would make it scale linearly with the number of users.

The function version:

CREATE FUNCTION high_score_for_user(user_id int) RETURNS SETOF INTEGER
LANGUAGE SQL AS $$
       SELECT t.id FROM test t
       WHERE t.user_id = $1
       ORDER BY t.score DESC LIMIT 1
$$;
SELECT high_score_for_user(u.id) FROM test_users u;

runs in 100ms.

I think I'm stuck with either creating temporary functions with the
constants already replaced, or creating an SQL function that evaluates
a brand new query as a string as Yeb suggested.

--
Glenn Maynard

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: foreign key constraint lock behavour in postgresql
Следующее
От: "Albe Laurenz"
Дата:
Сообщение: Re: foreign key constraint lock behavour in postgresql