Обсуждение: How do query optimizers affect window functions

Поиск
Список
Период
Сортировка

How do query optimizers affect window functions

От
Tianyin Xu
Дата:
Hi, Postgresql,

I want to understand how the query optimizers affect the output of the window functions.

For example, set "cpu_tuple_cost = 50" in postgresql.conf and start the server, I apply the regress test (make installcheck). The test of window function fails.

Checking the diff and I found the output of the window functions are different. For example,

For the following query:

SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10;

The expected results are:

 sum | unique1 | four
-----+---------+------
  45 |       4 |    0
  41 |       2 |    2
  39 |       1 |    1
  38 |       6 |    2
  32 |       9 |    1
  23 |       8 |    0
  15 |       5 |    1
  10 |       3 |    3
   7 |       7 |    3
   0 |       0 |    0

But the real results are:

 sum | unique1 | four
-----+---------+------
  45 |       0 |    0   
  45 |       1 |    1   
  44 |       2 |    2   
  42 |       3 |    3   
  39 |       4 |    0   
  35 |       5 |    1   
  30 |       6 |    2   
  24 |       7 |    3   
  17 |       8 |    0   
   9 |       9 |    1   

There're altogether 6 queries in window test that outputs different query results.

I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose.

I suspected the previous queries had some side effect on the latter one (e.g., change the current row), so I removed all the previous queries before this query in window.sql. But the result did not change.

Could anyone explain this behavior? Or point out how to investigate?

Thanks a lot!
Tianyin
 

Re: How do query optimizers affect window functions

От
Igor Romanchenko
Дата:
On Wed, Nov 14, 2012 at 10:12 AM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
Hi, Postgresql,

I want to understand how the query optimizers affect the output of the window functions.

For example, set "cpu_tuple_cost = 50" in postgresql.conf and start the server, I apply the regress test (make installcheck). The test of window function fails.

Checking the diff and I found the output of the window functions are different. For example,

For the following query:

SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10;

The expected results are:

 sum | unique1 | four
-----+---------+------
  45 |       4 |    0
  41 |       2 |    2
  39 |       1 |    1
  38 |       6 |    2
  32 |       9 |    1
  23 |       8 |    0
  15 |       5 |    1
  10 |       3 |    3
   7 |       7 |    3
   0 |       0 |    0

But the real results are:

 sum | unique1 | four
-----+---------+------
  45 |       0 |    0   
  45 |       1 |    1   
  44 |       2 |    2   
  42 |       3 |    3   
  39 |       4 |    0   
  35 |       5 |    1   
  30 |       6 |    2   
  24 |       7 |    3   
  17 |       8 |    0   
   9 |       9 |    1   

There're altogether 6 queries in window test that outputs different query results.

I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose.

I suspected the previous queries had some side effect on the latter one (e.g., change the current row), so I removed all the previous queries before this query in window.sql. But the result did not change.

Could anyone explain this behavior? Or point out how to investigate?

Thanks a lot!
Tianyin
 

Hi.
In short: if no explicit ordering specivied for a query the resulting set can be in any order. It is up to query optimizer to chose in what order the resulting tuples will be. 
The window function used in this test case rely on the order of the resulting set (it sums from current to the last) so it will generate different results for different query plans.

I think for this test cases (window functions) explicit ordering should be specified. In "normal" cases order dependent window functions are newer used without explicit ordering.

Re: How do query optimizers affect window functions

От
Jeff Janes
Дата:
On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
> Hi, Postgresql,
>
> I want to understand how the query optimizers affect the output of the
> window functions.

Use "EXPLAIN".

One is an index scan, one is a bitmap scan.  They return rows in a
different order.

..

> I don't understand why the results are different. Intuitively, the queries
> show return the same results no matter what plan the optimizer choose.

My intuition is that the query should refuse to run at all, because
the results are order dependent and you haven't specified an ordering.

Cheers,

Jeff


Re: How do query optimizers affect window functions

От
Tianyin Xu
Дата:
Thanks a lot, Jeff!


On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
> Hi, Postgresql,
>
> I want to understand how the query optimizers affect the output of the
> window functions.

Use "EXPLAIN".

One is an index scan, one is a bitmap scan.  They return rows in a
different order.

..

> I don't understand why the results are different. Intuitively, the queries
> show return the same results no matter what plan the optimizer choose.

My intuition is that the query should refuse to run at all, because
the results are order dependent and you haven't specified an ordering.


What do you mean by "refused to run"? You mean we have to specify the order when using the window functions? Could you explain more?

Thanks!
 
Cheers,

Jeff



--
Tianyin XU,
http://cseweb.ucsd.edu/~tixu/

Re: How do query optimizers affect window functions

От
Jeff Janes
Дата:
On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
> Thanks a lot, Jeff!
>
>
> On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
>> > Hi, Postgresql,
>> >
>> > I want to understand how the query optimizers affect the output of the
>> > window functions.
>>
>> Use "EXPLAIN".
>>
>> One is an index scan, one is a bitmap scan.  They return rows in a
>> different order.
>>
>> ..
>>
>> > I don't understand why the results are different. Intuitively, the
>> > queries
>> > show return the same results no matter what plan the optimizer choose.
>>
>> My intuition is that the query should refuse to run at all, because
>> the results are order dependent and you haven't specified an ordering.
>>
>
> What do you mean by "refused to run"?

I mean that it could throw an error.  Kind of like the way this
currently throws an error:

select b, sum(b) from foo;
ERROR:  column "foo.b" must appear in the GROUP BY clause or be used
in an aggregate function.

To be clear, I am not saying that it does do this (clearly it does
not), just that my intuition is that it should do this.

> You mean we have to specify the order
> when using the window functions? Could you explain more?

Not all uses of window functions have results that depend on the
order.  If you only use "partition by", there would be no reason to
force an ordering, for example.

Cheers,

Jeff


Re: How do query optimizers affect window functions

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu <tixu@cs.ucsd.edu> wrote:
>> What do you mean by "refused to run"?

> I mean that it could throw an error.  Kind of like the way this
> currently throws an error:

> select b, sum(b) from foo;
> ERROR:  column "foo.b" must appear in the GROUP BY clause or be used
> in an aggregate function.

> To be clear, I am not saying that it does do this (clearly it does
> not), just that my intuition is that it should do this.

The SQL standard says that underspecified window ordering gives you
implementation-dependent results, but not an error.  (Their use of
"implementation-dependent" basically means "unspecified".)

I think this is a fairly reasonable definition, since in many practical
cases it would be hard for the parser to tell whether the window
ordering was nailed down sufficiently to give a unique result, anyway.
(Even if we required you to give an ORDER BY for each column, there are
examples such as zero/minus-zero in float8 where that doesn't produce a
unique ordering.  And such a requirement would just be a pain in the
rear a lot of the time.)

It's also consistent with what you get if, for example, you use LIMIT
without an ORDER BY or with an ORDER BY that doesn't constrain the
results to a unique row ordering.

In practice it's on the user to be sure he's nailed down the row
ordering sufficiently to get the results he wants in these cases.

            regards, tom lane