Обсуждение: Optimize ORDER BY ... LIMIT

Поиск
Список
Период
Сортировка

Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
I've been looking at doing the following TODO item:
   Allow ORDER BY ... LIMIT # to select high/low value without sort or index   using a sequential scan for
highest/lowestvalues
 
   Right now, if no index exists, ORDER BY ... LIMIT # requires we sort all   values to return the high/low value.
InsteadThe idea is to do a   sequential scan to find the high/low value, thus avoiding the sort.   MIN/MAX already does
this,but not for LIMIT > 1.
 

I think this is pretty important to cover at some point because really _not_
doing this just wrong. We're simply not supporting the correct plan for this
type of query. Currently we're doing a O(nlogn) plan when the right plan would
usually be O(n). (As in, it's actually O(nlogm) where m is usually small and
not interesting).

The way I see to do this is to still use a Sort node and use a tuplesort but
to arrange to get the information of the maximum number of tuples needed to
the tuplesort so it can throw out tuples as it sorts.

My plan is to have tuplesort reuse the existing heap code it uses for tape
merges only keep the memtuples array in a max-heap (instead of the min-heap it
uses now -- that means having a tuplesortstate flag indicating which order and
having the tuplesort_heap* functions check that flag). When it reaches the
limit it can throw away either the new element or the top element on every
insert.

I considered using a simple insertion-sort instead but was worried that the
performance would degrade as the limit clause grows large. I don't think
that's a major use case but I don't like the idea of a O(n^2) algorithm lying
in wait to ambush someone.

Also, because heap sort is slower than qsort (on average anyways) it makes
sense to not bother with the heap until the number of tuples grows well beyond
the limit or until it would otherwise spill to disk.

To actually get the information to the tuplesort the information has to be fed
down to the SortState from the LimitState somehow. This I'm not sure how to
do. There isn't currently any abstract interface between nodes to pass
information like this.

The simple solution is that ExecLimit could just peek at its outerPlanState
and if it's a SortState it can set some fields so the SortState can know to
pass the information to the tuplesort.

I've also considered a more abstract interface such as adding an ExecAdvise()
function that would pass some sort of structure (an node?) down with the
information. This seems like overkill for a single integer but I wonder if
there would be other consumers of such an interface. 

The current eflags could be turned swallowed by this, though I don't see any
particular advantage. More realistically a Unique node could also inform a
Sort node that it can throw away duplicates as it sorts. A limit could even be
passed *through* a unique node as long as the Sort understands how to handle
the combination properly. In other areas, a Hash Aggregate can start throw
away elements once the number of elements in the hash grows to the limit.

Alternatively we could have Limit(Sort()), Unique(Sort()), and
Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
introduce the Limit and Unique nodes at all. I would worry about duplicated
code in that case though, in particular it seems like there would be cases
where we still want to use qsort rather than throw away unneeded tuples. But
not throwing away unneeded tuples means reimplementing all of nodeLimit in
nodeSort for those cases. And that doesn't help with other cases like
Hash Aggregate.

Or am I overthinking this and having some state nodes peek inside other state
nodes is normal?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimize ORDER BY ... LIMIT

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> I've been looking at doing the following TODO item:
>     Allow ORDER BY ... LIMIT # to select high/low value without sort or index
>     using a sequential scan for highest/lowest values

> I think this is pretty important to cover at some point because really _not_
> doing this just wrong.

I can't get all *that* excited about it, since an index solves the
problem.

> The way I see to do this is to still use a Sort node and use a tuplesort but
> to arrange to get the information of the maximum number of tuples needed to
> the tuplesort so it can throw out tuples as it sorts.

The implementation that was proposed in the earlier discussion did not
involve hacking the sort code beyond recognition ;-).

I believe a better way to think about this would be as an aggregate that
remembers the top N rows.  It can't quite be an aggregate as it stands
(unless we want to invent aggregates that can return SETOF?)  but I
think there might be some useful overlap with the SQL2003
window-function concept.
        regards, tom lane


Re: Optimize ORDER BY ... LIMIT

От
Andrew Dunstan
Дата:
Tom Lane wrote:
> (unless we want to invent aggregates that can return SETOF?)  
>   

Doesn't sound like a bad idea at all ...

cheers

andrew



Re: Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I believe a better way to think about this would be as an aggregate that
> remembers the top N rows.  

Wouldn't such a thing just be a reimplementation of a tuplestore though? I
mean, it's storing tuples you feed it, sorting them, and spitting them back
out in sorted order.

What would you do if the set of tuples turned out to be larger than you
expected and not fit in memory? Create a tuplesort and pass them on to it?

I've already looked at tuplesort and the changes there are minimal. The hard
part is what to do in the planner and executor to get the information to the
tuplestore. Do we want the plan to look the way it does now or use some new
sort of node that consolidates the limit and the sort in the same place.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimize ORDER BY ... LIMIT

От
Martijn van Oosterhout
Дата:
On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote:
> Also, because heap sort is slower than qsort (on average anyways) it makes
> sense to not bother with the heap until the number of tuples grows well beyond
> the limit or until it would otherwise spill to disk.

The thought that comes to mind is to leave the sorting as is, but
change the code that writes to the tapes to stop writing once it hits
the limit. So each tape will never have more than N tuples, where N is
the limit. This would be fairly unobtrusive because none of the other
code actually needs to care.

> Alternatively we could have Limit(Sort()), Unique(Sort()), and
> Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
> introduce the Limit and Unique nodes at all.

I don't think it's easy to merge Unique and Sort, mainly because the
fields you're doing the Unique on are probably not the fields you're
sorting on, so you're probably not saving much.

However, merging Unique/Distinct/GroupBy is another avenue that has
been considered.

In general LIMIT is not handled bad because we don't execute further
once we have the number of tuples. Only nodes that Materialize are a
problem, basically Sort being the common one.

> Or am I overthinking this and having some state nodes peek inside other state
> nodes is normal?

I don't think so. In general the parser and planner poke around quite a
bit, but once the optimizer has been over it, the plan has to be
static, for rescans, backward scans, executing stored plans, etc.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>
>> I think this is pretty important to cover at some point because really _not_
>> doing this just wrong.
>
> I can't get all *that* excited about it, since an index solves the
> problem.

Well I'm not all *that* excited about it either, it's just another plan and
there are an infinite number of possible plans out there we could infinite for
various corner cases.

But just in case it's not clear for anyone the usual use case for this paging
results on a web page. As much as I normally try to convince people they don't
want to do it that way they usually do end up with it implemented using
limit/offset. And Postgres currently is absolutely *awful* at running those
queries.

Often the killer requirement that makes it infeasible to create an index is
precisely that they want to be able to sort on any of a long list of possible
keys. Creating dozens of keys on every table isn't too appealing.

And in any case the query is often a join where the data in the sort key isn't
even all coming from the same table or where you need to use other indexes to
fetch the data prior to the sort.

I won't discourage anyone from working on OLAP queries and this is indeed a
similar idea. I suspect the same functionality in tuplesort of being able to
set a maximum number of tuples to keep will be useful there too. 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimize ORDER BY ... LIMIT

От
Alvaro Herrera
Дата:
Gregory Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > I believe a better way to think about this would be as an aggregate that
> > remembers the top N rows.  
> 
> Wouldn't such a thing just be a reimplementation of a tuplestore though? I
> mean, it's storing tuples you feed it, sorting them, and spitting them back
> out in sorted order.

I don't know if this is the same thing you are talking about, but Oleg
talked to me on the conference about "partial sort", which AFAICS it's
about the same thing you are talking about.  I think Teodor submitted a
patch to implement it, which was rejected because of not being general
enough.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:

> On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote:
>> Also, because heap sort is slower than qsort (on average anyways) it makes
>> sense to not bother with the heap until the number of tuples grows well beyond
>> the limit or until it would otherwise spill to disk.
>
> The thought that comes to mind is to leave the sorting as is, but
> change the code that writes to the tapes to stop writing once it hits
> the limit. So each tape will never have more than N tuples, where N is
> the limit. This would be fairly unobtrusive because none of the other
> code actually needs to care.

I'm sorry, I forgot to mention that part of my analysis. Once you reach disk
any chance of optimising the limit case is pretty much out the window. You
could trim some tuples from each tape but unless you're sorting truly
stupendous amounts of data I doubt it would really help much.

I think it only makes sense to look at the in-memory case. Instead of qsorting
thousands of records or, worse, spilling millions of records to disk and doing
an external sort only to use only the top 10 and throw the rest away, we throw
tuples away before they accumulate in memory in the first place.

>> Alternatively we could have Limit(Sort()), Unique(Sort()), and
>> Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
>> introduce the Limit and Unique nodes at all. 
>
> I don't think it's easy to merge Unique and Sort, mainly because the
> fields you're doing the Unique on are probably not the fields you're
> sorting on, so you're probably not saving much.

On the contrary I think the vast majority of the time you have a Unique(Sort)
it will be the same key because it will be caused by a SELECT DISTINCT. Now
that I actually test it I see there are more nodes that could do implement
this (namely GroupAgg) so I'm thinking more and more about having an abstract
way to pass information down through the nodes rather than handle just
Limit/Sort specifically.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimize ORDER BY ... LIMIT

От
mark@mark.mielke.cc
Дата:
On Fri, Sep 15, 2006 at 08:22:50PM +0100, Gregory Stark wrote:
> But just in case it's not clear for anyone the usual use case for
> this paging results on a web page. As much as I normally try to
> convince people they don't want to do it that way they usually do
> end up with it implemented using limit/offset. And Postgres
> currently is absolutely *awful* at running those queries.

I'm curious, as I may be such an offender. What alternatives exist?

I think you mean the concept of showing a page of information that
you can navigate forwards and backwards a page, or select a range.

What alternatives to limit/offset exist? If there are thousands or
more results, I have trouble with an idea that the entire results
should be queried, and cached, displaying only a fraction.

I think most or all of the times I do this, an index is available,
so perhaps that is why I don't find this issue problematic?

For implementation, I think something simple is best:
   - limit X offset Y

This means keeping a set of X+Y tuples as follows:
   1) If set of X+Y tuples still has room, insert using a binary search      that retains the ordering characteristics
thatwould be had if      limit/offset had not been used.
 
   2) If the row is less than the X+Y'th tuple, remove the X+Y'th      tuple from the set, and insert the row as per
1).
   3) Ignore the tuple.

At the end, you return from the set starting at Y+1, and ending at Y+X.

If X+Y tuples would take up too much memory, this plan should be
skipped, and the general routines used instead.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
mark@mark.mielke.cc writes:

> On Fri, Sep 15, 2006 at 08:22:50PM +0100, Gregory Stark wrote:
>
> I'm curious, as I may be such an offender. What alternatives exist?
>
> I think you mean the concept of showing a page of information that
> you can navigate forwards and backwards a page, or select a range.
>
> What alternatives to limit/offset exist? If there are thousands or
> more results, I have trouble with an idea that the entire results
> should be queried, and cached, displaying only a fraction.
>
> I think most or all of the times I do this, an index is available,
> so perhaps that is why I don't find this issue problematic?

If you have a unique index and instead of using OFFSET you pass along the last
key of the previous page then you can use a WHERE clause on the indexed column
to go straight to the correct page rather than using OFFSET.

So for example if you're displaying bank transactions sorted by transaction_id
you have the "next page" button pass along the "start_transaction_id=nnn"
where nnn is the last transaction_id of the previous page. Then on the next
page you do a query with "WHERE transaction_id > ?" and pass that column. You
still use ORDER BY transaction_id and LIMIT.

This has the upside that your query always takes the same amount of time.

Using OFFSET means later pages take longer, possibly much longer, than earlier
pages. Possibly so much longer that it causes enough i/o to bring down your
web server etc.

It does have lots of downsides as well. You cannot provide direct links to the
later pages aside from the next page. It's difficult to provide a proper
"previous page" button. etc. Early in the web's history these were reasonable
trade-offs but nowadays it's hard to convince people that their $bazillion
machine can't handle sorting a couple thousand records for each page view and
they should sacrifice the user experience for that. So I've pretty much given
up on that battle.

> For implementation, I think something simple is best:
[...]

You just described using an insertion sort. Even if I went with insertion sort
instead of heap sort I think it makes sense to do it inside tuplesort. 

> If X+Y tuples would take up too much memory, this plan should be
> skipped, and the general routines used instead.

And that's a big part of why...


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimize ORDER BY ... LIMIT

От
Gregory Stark
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> I don't know if this is the same thing you are talking about, but Oleg
> talked to me on the conference about "partial sort", which AFAICS it's
> about the same thing you are talking about.  I think Teodor submitted a
> patch to implement it, which was rejected because of not being general
> enough.


Oof, you have a long memory. Oleg does reference such a thing in his 2002 post
that ended up resulting in the TODO item. I can't find the original patch but
I doubt any patch against 7.1 is going to be all that helpful in understanding
what to do today.

I'm also confused how he only saw a factor of 6 improvement in reading the top
100 out of a million. I would expect much better.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [SPAM?] Re: Optimize ORDER BY ... LIMIT

От
mark@mark.mielke.cc
Дата:
On Fri, Sep 15, 2006 at 10:06:16PM +0100, Gregory Stark wrote:
> > I'm curious, as I may be such an offender. What alternatives exist?
> > ...
> > What alternatives to limit/offset exist? If there are thousands or
> > more results, I have trouble with an idea that the entire results
> > should be queried, and cached, displaying only a fraction.

> If you have a unique index and instead of using OFFSET you pass
> along the last key of the previous page then you can use a WHERE
> clause on the indexed column to go straight to the correct page
> rather than using OFFSET.  So for example if you're displaying bank
> transactions sorted by transaction_id you have the "next page"
> button pass along the "start_transaction_id=nnn" where nnn is the
> last transaction_id of the previous page. Then on the next page you
> do a query with "WHERE transaction_id > ?" and pass that column. You
> still use ORDER BY transaction_id and LIMIT.

I found benefits to doing things this way that were not related to
performance. If the number of items leading up to your page changes,
remembering the offset can result in listing a completely different
page than you intended when paging forward or backwards. On my pages,
I prefer to define one of the items as the item I am looking at, and
page seeking is always +/- 1 page from that item. This means that I
am pretty close to what you are suggesting - except - because I do
this for functional reasons, and not for performance reasons, I am
doing something worse.

I use COUNT(*) and WHERE as you describe above to map this identifier
to an offset, and then a second SELECT with LIMIT/OFFSET to describe
the object and the those that follow on the page.

According to your suggestion, I think this means I should track the
identifier with the last offset, displaying the offset to the user for
information purposes only, not using it for any queries, and then use
WHERE and LIMIT?

I tried this out. EXPLAIN ANALYZE tells me that for a random
offset=200, limit=20 case I tried, the simple change changes it from
index scanning 207 rows to find 7 rows, to index scanning 7 rows to
find 7 rows. Sweet. Unfortunately, the time to complete is unchanged
around 1.3+/-0.2 milliseconds. Looks like my system has bigger
bottlenecks. :-)

Thanks,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Optimize ORDER BY ... LIMIT

От
"Nicolas Barbier"
Дата:
2006/9/16, Gregory Stark <stark@enterprisedb.com>:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
>
> > I don't know if this is the same thing you are talking about, but Oleg
> > talked to me on the conference about "partial sort", which AFAICS it's
> > about the same thing you are talking about.  I think Teodor submitted a
> > patch to implement it, which was rejected because of not being general
> > enough.
>
> Oof, you have a long memory. Oleg does reference such a thing in his 2002 post
> that ended up resulting in the TODO item. I can't find the original patch but
> I doubt any patch against 7.1 is going to be all that helpful in understanding
> what to do today.
>
> I'm also confused how he only saw a factor of 6 improvement in reading the top
> 100 out of a million. I would expect much better.

For example, consider the case in which 6 passes are needed to do the
full sort. Then, for a "partial sort", at least the first of these
passes has to be fully executed, because one needs to read at least
all the data once to find the "top n".

greetings,
Nicolas

-- 
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html