Обсуждение: Strange random() Correlation

Поиск
Список
Период
Сортировка

Strange random() Correlation

От
Volkan YAZICI
Дата:
Hi,

ISTM, there's a problem in the correlation of random() to outer JOINs.
Here's a test case:

BEGIN;

CREATE TEMP TABLE nuc_codes (id serial, code char(1));

COPY nuc_codes (code) FROM stdin;
A
C
D
G
H
K
M
N
R
S
T
U
V
W
X
Y
\.

SELECT id, code FROM nuc_codes;

SELECT T1.r1, T1.r2, T2.code, T3.code
  FROM (SELECT ((random() * 100)::int4 % 17),
               ((random() * 100)::int4 % 17)
          FROM generate_series(1, 10)
       ) AS T1 (r1, r2)
  LEFT OUTER JOIN nuc_codes T2 ON (T2.id = T1.r1)
  LEFT OUTER JOIN nuc_codes T3 ON (T3.id = T1.r2);

ROLLBACK;

If you run above query, you'll realize the inconsistency in the output.
Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
more stable but still has some inconsistencies.


Regards.

P.S. Query tested on 8.1.4 and a 2-3 weeks old cvs tip.

Re: Strange random() Correlation

От
Tom Lane
Дата:
Volkan YAZICI <yazicivo@ttnet.net.tr> writes:
> ISTM, there's a problem in the correlation of random() to outer JOINs.

The random() functions are being evaluated more than once because the
subselect gets "flattened" into the outer query, so that you have
the equivalent of

    select random(), ...  where t2.id = random() ...

We've previously discussed preventing the planner from flattening if
there are any volatile functions in the sub-select's output list, but
I think that would probably do about as much harm as good.  The cases
where this actually matters are rare and the programmer knows darn well
what he's doing, so the workaround of inserting an OFFSET 0 to prevent
the flattening seems acceptable to me.

> Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
> more stable but still has some inconsistencies.

I didn't see any...

            regards, tom lane

Re: Strange random() Correlation

От
Volkan YAZICI
Дата:
On May 27 11:50, Tom Lane wrote:
> Volkan YAZICI <yazicivo@ttnet.net.tr> writes:
> > ISTM, there's a problem in the correlation of random() to outer JOINs.
>
> The random() functions are being evaluated more than once because the
> subselect gets "flattened" into the outer query, so that you have
> the equivalent of
>
>     select random(), ...  where t2.id = random() ...

Oops, sorry. I've just remembered this.

> We've previously discussed preventing the planner from flattening if
> there are any volatile functions in the sub-select's output list, but
> I think that would probably do about as much harm as good.

It can be quite informative to learn the pros and cons of this issue,
but I couldn't find related discussion in archives. I'd be so
appreciated if you can remember its subject or anything specifier for
the thread.

> > Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
> > more stable but still has some inconsistencies.
>
> I didn't see any...

That's all caused by a mis-interpretation of the output by me. Replacing
"(random() * 100)::int % 17" with "1 + ((random() * 100)::int % 16)"
solved my above question.


Regards.