Обсуждение: query with offset stops using index scan

Поиск
Список
Период
Сортировка

query with offset stops using index scan

От
"Stanislav Raskin"
Дата:

Hello everybody,

 

I have some weird behaviour with a pretty simple query, which I use in a web front end to browse through pages of data.

 

SELECT

            foo.id, get_processing_status(foo.id) AS status, foo.name, foo.valid_until

FROM

            foo

 WHERE foo.active  AND foo.valid_until < 1220186528 AND NOT foo.locked

ORDER BY foo.id DESC

LIMIT  25

OFFSET 100

 

This very query works quite quickly, and the query plan looks like this:

 

"Limit  (cost=36.04..45.05 rows=25 width=63)"

"  ->  Index Scan Backward using foo_pkey on foo  (cost=0.00..511.35 rows=1419 width=63)"

"        Filter: (active AND (valid_until < 1220186528) AND (NOT locked))"

 

Now, if I increase OFFSET slowly, it works all the same way, until OFFSET reaches the value of 750. Then, the planner refuses to use an index scan and does a plain seq scan+sort, which makes the query about 10-20 times slower:

 

"Limit  (cost=272.99..273.05 rows=25 width=63)"

"  ->  Sort  (cost=271.11..274.66 rows=1419 width=63)"

"        Sort Key: id"

"        ->  Seq Scan on foo (cost=0.00..196.82 rows=1419 width=63)"

"              Filter: (active AND (valid_until < 1220186528) AND (NOT locked))"

 

I use 8.1.4, and I did a vacuum full analyze before running the queries.

 

What might be the issue here? Could a reindex on the pkey help?

 

Kind Regards

 

Stanisalv Raskin

 

 

Re: query with offset stops using index scan

От
"Scott Marlowe"
Дата:
On Sun, Aug 31, 2008 at 7:14 AM, Stanislav Raskin <sr@brainswell.de> wrote:
> Hello everybody,
>
> Now, if I increase OFFSET slowly, it works all the same way, until OFFSET
> reaches the value of 750. Then, the planner refuses to use an index scan and
> does a plain seq scan+sort, which makes the query about 10-20 times slower:
>
> I use 8.1.4, and I did a vacuum full analyze before running the queries.

If there's a chance to upgrade to 8.3 please do so.  While 8.1 was a
solid reliable workhorse of a database, there's been a lot of work
done in general for better performance and more features.  It likely
won't fix this one problem, but it's often smarter about corner cases
in query plans than 8.1 so it's worth looking into.

Now back to your problem.  What's happening here is that the query
planner is switching plans because it thinks the sequential scan and
sort are cheaper.  and at some point it will likely be right.  That's
because a random page cost is much higher than a sequential page cost.
 So at some point, say when you're grabbing 2% to 25% of a table, it
will switch to sequential scans.

Now, if the data is all cached, then it's still quicker to do the
index scan further along than to use a seq scan and a sort.  Unless
your table is clustered to the index you're sorting on, a Seq scan
will almost always win if you need the whole table.

However, you may be in a position where a multi-column index and
clustering on id will allow you to run this offset higher.  It's still
a poor performer for large chunks of large tables.

first cluster on the primary key id, then create a three column index
for (active, valid_until, locked)  Note that the order should be from
the most choosey to least choosey column, generally.  So assuming only
a tiny percentage of records meet valid_until, make it the first
column, and so forth.  A query like:

select active, count(active) from table group by active;

will give you an idea there.

In the long run if you want good performance on larger data sets (i.e.
higher offset numbers) you'll likely need to switch to either cursors,
or using "where id between x and x+y" or lookup tables, or something
like that.

Re: query with offset stops using index scan

От
"Stanislav Raskin"
Дата:
> If there's a chance to upgrade to 8.3 please do so.

I am aware of the benefits with 8.3, but such an upgrade would require quite
some changes in our application, including introduction of explicit casting
mechanisms. We are going to do so sooner or later, but right now we need to
focus on other stuff.

> What's happening here is that the query
> planner is switching plans because it thinks the sequential scan and
> sort are cheaper.  and at some point it will likely be right.

Thank you very much for pointing out the issue.
I am still a bit puzzled, because there are only about 2000 data sets in
this table. It's not like we're handling millions of rows.

Clustering on the id did the trick. Now the planner always chooses to use
the index.

Unfortunately, I cannot use multi-column indexes here, because the
expressions in the WHERE statement can vary quite strongly, depending on
user input.

Cursors are not an option, because it is a web application, meaning that I
have to use a new connection for basically every HTTP request.

The "where id between x and x+y" is indeed much, much faster, but it
presumes the knowledge of x and y, which is not the case, because serial ids
are not necessarily continuous (i.e. if some data sets were deleted).





-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Gesendet: Sonntag, 31. August 2008 17:26
An: Stanislav Raskin
Cc: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] query with offset stops using index scan

On Sun, Aug 31, 2008 at 7:14 AM, Stanislav Raskin <sr@brainswell.de> wrote:
> Hello everybody,
>
> Now, if I increase OFFSET slowly, it works all the same way, until OFFSET
> reaches the value of 750. Then, the planner refuses to use an index scan
and
> does a plain seq scan+sort, which makes the query about 10-20 times
slower:
>
> I use 8.1.4, and I did a vacuum full analyze before running the queries.

If there's a chance to upgrade to 8.3 please do so.  While 8.1 was a
solid reliable workhorse of a database, there's been a lot of work
done in general for better performance and more features.  It likely
won't fix this one problem, but it's often smarter about corner cases
in query plans than 8.1 so it's worth looking into.

Now back to your problem.  What's happening here is that the query
planner is switching plans because it thinks the sequential scan and
sort are cheaper.  and at some point it will likely be right.  That's
because a random page cost is much higher than a sequential page cost.
 So at some point, say when you're grabbing 2% to 25% of a table, it
will switch to sequential scans.

Now, if the data is all cached, then it's still quicker to do the
index scan further along than to use a seq scan and a sort.  Unless
your table is clustered to the index you're sorting on, a Seq scan
will almost always win if you need the whole table.

However, you may be in a position where a multi-column index and
clustering on id will allow you to run this offset higher.  It's still
a poor performer for large chunks of large tables.

first cluster on the primary key id, then create a three column index
for (active, valid_until, locked)  Note that the order should be from
the most choosey to least choosey column, generally.  So assuming only
a tiny percentage of records meet valid_until, make it the first
column, and so forth.  A query like:

select active, count(active) from table group by active;

will give you an idea there.

In the long run if you want good performance on larger data sets (i.e.
higher offset numbers) you'll likely need to switch to either cursors,
or using "where id between x and x+y" or lookup tables, or something
like that.


Re: query with offset stops using index scan

От
Alvaro Herrera
Дата:
Stanislav Raskin wrote:

> Now, if I increase OFFSET slowly, it works all the same way, until OFFSET
> reaches the value of 750. Then, the planner refuses to use an index scan and
> does a plain seq scan+sort, which makes the query about 10-20 times slower:

You may want to try setting enable_seqscan to off before that query (and
making sure you turn it back on; preferably use SET LOCAL inside a
transaction block) so that it gives more preference to the indexscan.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.