Re: PostgreSQL 9.6 behavior change with set returning (funct).*

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: PostgreSQL 9.6 behavior change with set returning (funct).*
Дата
Msg-id CAKFQuwY3t-p0KiZ3qWxrj4KonJiEVBVcoBc_0rh2EV-5zAuKEQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL 9.6 behavior change with set returning (funct).*  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PostgreSQL 9.6 behavior change with set returning (funct).*  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wednesday, March 23, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Regina Obe" <lr@pcorp.us> writes:
> In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
> started failing.  I traced the issue down to a behavior change in 9.6 when
> dealing with output of set returning functions when used with (func).*
> syntax.

> CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
> RETURNS TABLE(id integer, junk1 text, junk2 text)
> ...
> -- Get 16 rows in 9.6, Get 8 rows in 9.5
> SELECT (dumpset(f.test, 'hello world' || f.test)).*
> FROM generate_series(1,4) As f(test)
> ORDER BY junk2;

I think this is a side-effect of 9118d03a8 ("When appropriate, postpone
SELECT output expressions till after ORDER BY").  Previously, although you
got N evaluations of the SRF which is pretty horrid, they were all in the
same targetlist and hence ran in sync and produced only the expected
number of rows (thanks to the otherwise-indefensible ExecTargetList
behavior by which multiple SRF tlist expressions produce a number of rows
equal to the least common multiple of their periods, not the product).
That commit causes the evaluation of dumpset(...).junk1 to be postponed
till after the Sort step, but the evaluation of dumpset(...).junk2
necessarily can't be.  So now you get dumpset(...).junk2 inflating
the original rowcount 2X, and then dumpset(...).junk1 inflating it
another 2X after the Sort.

We could remain bug-compatible with the old behavior by adding some
restriction to keep all the tlist SRFs getting evaluated at the same
plan step, at least to the extent that we can.  I think you could get
similar strange behaviors in prior releases if you used GROUP BY or
another syntax that might result in early evaluation of the sort column,
and we're not going to be able to fix that.  But we could prevent this
particular optimization from introducing new strangeness.

But I'm not really sure that we should.  The way that you *should*
write this query is

SELECT ds.*
FROM generate_series(1,4) AS f(test),
     dumpset(f.test, 'hello world' || f.test) AS ds
ORDER BY junk2;

which is both SQL-standard semantics and much more efficient than
SRF-in-tlist.  We've more or less deprecated SRF-in-tlist since we
introduced LATERAL in 9.3.  How much are we willing to do to stay
bug-compatible with old behaviors here?


My gut reaction is that this is an unnecessary regression for the sake of a performance optimization that is likely drowned out in the usage presented anyway.

The pivot for me would be how hard would it be to maintain the old behavior in this "more or less deprecated" scenario.  I have no way to judge that.

David J.

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: multivariate statistics v14
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: NOT EXIST for PREPARE