Обсуждение: Strange behavior of some volatile function like random(), nextval()
<p>Hello!<p>Got some strange behavior of random() function:<p>postgres=# select (select random() ) from generate_series(1,10)as i;<br /> random<br /> -------------------<br /> 0.831577288918197<br /> 0.831577288918197<br/> 0.831577288918197<br /> 0.831577288918197<br /> 0.831577288918197<br /> 0.831577288918197<br/> 0.831577288918197<br /> 0.831577288918197<br /> 0.831577288918197<br /> 0.831577288918197<br/> (10 rows)<br /><br /> postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;<br/> ?column?<br /> --------------------<br /> 0.97471913928166<br /> 0.0532126761972904<br /> 0.331358563620597<br/> 0.0573496259748936<br /> 0.321165383327752<br /> 0.48836630070582<br /> 0.444201893173158<br/> 0.0729857799597085<br /> 0.661443184129894<br /> 0.706566562876105<br /> (10 rows)<p>postgres=#explain select (select random() ) from generate_series(1,10) as i;<br /> QUERY PLAN<br /> --------------------------------------------------------------------------<br/> Function Scan on generate_series i (cost=0.02..10.01rows=1000 width=0)<br /> InitPlan 1 (returns $0)<br /> -> Result (cost=0.00..0.01 rows=1 width=0)<br/> (3 rows)<br /><br /> postgres=# explain select (select random()+i*0 ) from generate_series(1,10) as i;<br /> QUERY PLAN<br /> --------------------------------------------------------------------------<br/> Function Scan on generate_series i (cost=0.00..30.00rows=1000 width=4)<br /> SubPlan 1<br /> -> Result (cost=0.00..0.02 rows=1 width=0)<br /> (3rows)<p>postgres=# \df+ random();<br /> List of functions<br/> Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description<br /> ------------+--------+------------------+---------------------+--------+----------+------------+----------+----------+-------------+--------------<br /> pg_catalog | random | double precision | | normal | invoker | volatile | postgres | internal |drandom | random value<br /> (1 row)<p><br /><p>Also: postgres=# create sequence test;<br /> CREATE SEQUENCE<br /> postgres=#SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i;<br /> nextval<br /> ---------<br /> 1<br/> 1<br /> 1<br /> 1<br /> 1<br /> 1<br /> 1<br /> 1<br /> 1<br/> 1<br /> (10 rows)<br /><br /> postgres=# SELECT (SELECT nextval('test')+i*0) FROM generate_series(1,10) asi;<br /> ?column?<br /> ----------<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br/> 7<br /> 8<br /> 9<br /> 10<br /> 11<br /> (10 rows)<br /><br /><div class="gmail_default"><br/> postgres=# \df+ nextval() ;<br /> List of functions<br /> Schema | Name | Resultdata type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description<br/> ------------+---------+------------------+---------------------+--------+----------+------------+----------+----------+-------------+---------------------<br /> pg_catalog | nextval | bigint | regclass | normal | invoker | volatile | postgres | internal |nextval_oid | sequence next value<br /> (1 row)<br /></div><p><br /><p>Both function is volatile so from docs :<p>"A VOLATILEfunction can do anything, including modifying the database. It can return different results on successive calls withthe same arguments. The optimizer makes no assumptions about the behavior of such functions. A query using a volatilefunction will re-evaluate the function at every row where its value is needed."<br /> Something wrong with executor?Is it bug or executor feature related with subquery?<br /><br /><pre class="moz-signature" cols="72">-- Alex Ignatov Postgres Professional: <a class="moz-txt-link-freetext" href="http://www.postgrespro.com">http://www.postgrespro.com</a> The Russian Postgres Company </pre>
On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote: > Hello! > > Got some strange behavior of random() function: > > postgres=# select (select random() ) from generate_series(1,10) as i; > random > ------------------- > 0.831577288918197 > [...] > (10 rows) I recall that this is treated as an implicit LATERAL, meaning that random() is calculated only once. > postgres=# select (select random()+i*0 ) from generate_series(1,10) as i; > ?column? > -------------------- > 0.97471913928166 > [...] > (10 rows) But not that. So those results do not surprise me. -- Michael
Re: Strange behavior of some volatile function like random(), nextval()
От
"David G. Johnston"
Дата:
More specifically...
On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
> Hello!
>
> Got some strange behavior of random() function:
>
> postgres=# select (select random() ) from generate_series(1,10) as i;
> random
> -------------------
> 0.831577288918197
> [...]
> (10 rows)
I recall that this is treated as an implicit LATERAL, meaning that
random() is calculated only once.
A non-correlated (i.e., does not refer to outer variables) subquery placed into the target-list need only have its value computed once - so that is what happens. The fact that a volatile function can return different values given the same arguments doesn't mean much when the function is only ever called a single time.
> postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
> ?column?
> --------------------
> 0.97471913928166
> [...]
> (10 rows)
But not that. So those results do not surprise me.
A correlated subquery, on the other hand, has to be called once for every row and is evaluated within the context supplied by said row. Each time random is called it returns a new value.
Section 4.2.11 (9.6 docs)
Maybe this could be worded better but the first part talks about a single execution while "any one execution" is mentioned in reference to "the surrounding query".
I do think that defining "correlated" and "non-correlated" subqueries within this section would be worthwhile.
David J.
On 29.06.2016 15:30, David G. Johnston wrote:
In this subquery(below) we have reference to outer variables but it is not working as it should(or i dont understand something):More specifically...On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
> Hello!
>
> Got some strange behavior of random() function:
>
> postgres=# select (select random() ) from generate_series(1,10) as i;
> random
> -------------------
> 0.831577288918197
> [...]
> (10 rows)
I recall that this is treated as an implicit LATERAL, meaning that
random() is calculated only once.A non-correlated (i.e., does not refer to outer variables) subquery placed into the target-list need only have its value computed once - so that is what happens. The fact that a volatile function can return different values given the same arguments doesn't mean much when the function is only ever called a single time.
> postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
> ?column?
> --------------------
> 0.97471913928166
> [...]
> (10 rows)
But not that. So those results do not surprise me.A correlated subquery, on the other hand, has to be called once for every row and is evaluated within the context supplied by said row. Each time random is called it returns a new value.Section 4.2.11 (9.6 docs)Maybe this could be worded better but the first part talks about a single execution while "any one execution" is mentioned in reference to "the surrounding query".I do think that defining "correlated" and "non-correlated" subqueries within this section would be worthwhile.David J.
postgres=# postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int) where id=id) from generate_series(1,10) as id;
id | string_agg
----+------------
1 | aaa
2 | aaa
...
but this query(with reference to outer var) working perfectly:
postgres=# select id,(select random() where id=id) from generate_series(1,10) as id;
id | random
----+--------------------
1 | 0.974509597290307
2 | 0.219822214450687
...
Also this query is working good( (id-id) do the job):
postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from generate_series(1,10) as id;
id | string_agg
----+------------
1 | aaaaaaa
2 | aaaaa
...
It means that even reference to outer variables doesn't mean that executor execute volatile function from subquery every time. Or there is something else what i should know?
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[ Please do not quote the entire thread in each followup. That's disrespectful of your readers' time, and will soon cause people to stop reading the thread, meaning you don't get answers. ] Alex Ignatov <a.ignatov@postgrespro.ru> writes: > In this subquery(below) we have reference to outer variables but it is > not working as it should(or i dont understand something): > postgres=# postgres=# select id, ( select string_agg('a','') from > generate_series(1,trunc(10*random()+1)::int) where id=id) from > generate_series(1,10) as id; The inner generate_series() call does not contain any outer references, so it doesn't get recomputed. There's a comment in ExecReScanFunctionScan about that: * Here we have a choice whether to drop the tuplestores (and recompute * the function outputs) or just rescan them. We must recompute if an * expression contains changed parameters, else we rescan. * * XXX maybe we shouldrecompute if the function is volatile? But in * general the executor doesn't conditionalize its actions on that. So you get some random number of generate_series output rows on the first call, but then each subsequent run of the subquery just rescans those same rows. I do not think this is wrong or bad, really; if it was done differently, examples such as this same generate_series call on the inside of a nestloop join would behave very strangely. regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes: > A correlated subquery, on the other hand, has to be called once for every > row and is evaluated within the context supplied by said row. Each time > random is called it returns a new value. > Section 4.2.11 (9.6 docs) > https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES > Maybe this could be worded better but the first part talks about a single > execution while "any one execution" is mentioned in reference to "the > surrounding query". > I do think that defining "correlated" and "non-correlated" subqueries > within this section would be worthwhile. Hmm ... a quick look around says we don't define or use those terms anywhere. I agree this could stand to be addressed somewhere, but I'm not sure if 4.2.11 is the most appropriate place. I don't think the issue is unique to scalar subqueries. regards, tom lane