Обсуждение: Clarify the ordering guarantees in combining queries (or lack thereof)

Поиск
Список
Период
Сортировка

Clarify the ordering guarantees in combining queries (or lack thereof)

От
Shay Rojansky
Дата:
Greetings.

I was trying to understand what - if any - are the guarantees with regards to ordering for combining queries (UNION/UNION ALL/...). From this message[1], it seems that UNION ALL does preserve the ordering of the operand queries, whereas UNION does not (presumably neither do INTERSECT, INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a note clarifying this.

Thanks,

Shay

Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
"David G. Johnston"
Дата:
On Wed, Jul 13, 2022 at 5:08 PM Shay Rojansky <roji@roji.org> wrote:
Greetings.

I was trying to understand what - if any - are the guarantees with regards to ordering for combining queries (UNION/UNION ALL/...). From this message[1], it seems that UNION ALL does preserve the ordering of the operand queries, whereas UNION does not (presumably neither do INTERSECT, INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a note clarifying this.


Since the documentation doesn't make a guarantee there is none.  If you want ordered output use ORDER BY.

David J.

Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
Shay Rojansky
Дата:

>> I was trying to understand what - if any - are the guarantees with regards to ordering for combining queries (UNION/UNION ALL/...). From this message[1], it seems that UNION ALL does preserve the ordering of the operand queries, whereas UNION does not (presumably neither do INTERSECT, INTERSECT ALL, EXCEPT and EXCEPT ALL).
>>
>> The documentation[2] makes no mention of this, I'd suggest adding a note clarifying this.
>
If you want ordered output use ORDER BY.

I don't see how that could be done. Consider the following:

(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to indicate in the thread quoted above - then the above works. If there's no such guarantee, then AFAIK the above can't be rewritten; putting the ORDER BY outside - on the results of the UNION ALL - would order all results rather than preserving each resultset's ordering.






Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
Pantelis Theodosiou
Дата:


On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:

>> I was trying to understand what - if any - are the guarantees with regards to ordering for combining queries (UNION/UNION ALL/...). From this message[1], it seems that UNION ALL does preserve the ordering of the operand queries, whereas UNION does not (presumably neither do INTERSECT, INTERSECT ALL, EXCEPT and EXCEPT ALL).
>>
>> The documentation[2] makes no mention of this, I'd suggest adding a note clarifying this.
>
If you want ordered output use ORDER BY.

I don't see how that could be done. Consider the following:

(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to indicate in the thread quoted above - then the above works. If there's no such guarantee, then AFAIK the above can't be rewritten; putting the ORDER BY outside - on the results of the UNION ALL - would order all results rather than preserving each resultset's ordering.

No, there is no guarantee. It's just that UNION ALL works this way today (preserving the order of the subselects) - and I'm not even sure about that, it may not preserve the order in all cases, with different indexes or partitioning or a parallel plan, etc.
In any case, there is no guarantee that the behaviour will not change in the future due to planner improvements.

Best regards
Pantelis Theodosiou
 

Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
Tom Lane
Дата:
Pantelis Theodosiou <ypercube@gmail.com> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).

> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.

Yeah, that.  You can get a parallelized plan today for UNION ALL:

=# explain analyze select * from foo union all select * from foo;
                                                                 QUERY PLAN
                    

--------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
         ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084
rows=853333loops=3) 
         ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299
rows=1280000loops=2) 
 Planning Time: 0.094 ms
 Execution Time: 488.352 ms

It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.

>>> If you want ordered output use ORDER BY.

>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);

You do it like this:

=# explain analyze (select * from foo union all select * from foo) order by unique1;
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
         Sort Key: foo.unique1
         Sort Method: external merge  Disk: 433552kB
         Worker 0:  Sort Method: external merge  Disk: 423400kB
         Worker 1:  Sort Method: external merge  Disk: 415592kB
         ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667
loops=3)
               ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118
rows=853333loops=3) 
               ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 rows=1066668 width=244) (actual
time=0.073..118.566rows=1280000 loops=2) 
 Planning Time: 0.109 ms
 Execution Time: 1830.390 ms
(13 rows)

The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.

            regards, tom lane



Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
"David G. Johnston"
Дата:
On Thursday, July 14, 2022, Shay Rojansky <roji@roji.org> wrote:

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to indicate in the thread quoted above - then the above works. If there's no such guarantee, then AFAIK the above can't be rewritten; putting the ORDER BY outside - on the results of the UNION ALL - would order all results rather than preserving each resultset's ordering.


Yes, an order by outside the union will sort the union results as a whole.  You can still write an order by and the union all so you get any conceivable ordering, though it may possibly require putting the union into a subquery depending on the order and output column combination desired.

David J.

Re: Clarify the ordering guarantees in combining queries (or lack thereof)

От
Shay Rojansky
Дата:
>> No, there is no guarantee. It's just that UNION ALL works this way today
>> (preserving the order of the subselects) - and I'm not even sure about
>> that, it may not preserve the order in all cases, with different indexes or
>> partitioning or a parallel plan, etc.
>
> Yeah, that.  You can get a parallelized plan today for UNION ALL:

...

Since the documentation doesn't make a guarantee there is none.

Thanks all for the confirmation.

I'd still suggest documenting the lack of guarantee; yes, mathematically it may be correct to not document lack of guarantees, but users can come with various expectations and misunderstandings (I also wasn't clear on this specifically for UNION ALL), and it's always good to say this kind of thing explicitly.