Обсуждение: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

Поиск
Список
Период
Сортировка

[BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
frank.von.postgresql.org@familie-gard.de
Дата:
The following bug has been logged on the website:

Bug reference:      14737
Logged by:          Frank Gard
Email address:      frank.von.postgresql.org@familie-gard.de
PostgreSQL version: 9.6.3
Operating system:   Debian GNU/Linux
Description:

As an example for my database programming lecture, I invented the following
PL/pgSQL function:
DROP TYPE IF EXISTS tp_histogramm CASCADE;
CREATE TYPE tp_histogramm AS  (   wert    INTEGER,   absolut INTEGER,   relativ DECIMAL( 7, 5 )
);

-- Variante 1 (funktioniert leider nicht korrekt, siehe Erläuterung im unten
stehenden Kommentar):
CREATE OR REPLACE FUNCTION fn_zufall(   p_von    INTEGER DEFAULT 1,   p_bis    INTEGER DEFAULT 6,   p_anzahl INTEGER
DEFAULT10000
 
)
RETURNS SETOF tp_histogramm
LANGUAGE PLpgSQL
AS $body$
DECLARE   c_create     CONSTANT VARCHAR := 'CREATE TEMPORARY TABLE IF NOT EXISTS
ttb_histogramm ( wert INTEGER, anzahl INTEGER ) ON COMMIT DROP';   c_drop       CONSTANT VARCHAR := 'DROP TABLE IF
EXISTSttb_histogramm 
CASCADE';   c_histogramm CURSOR FOR       SELECT * FROM ttb_histogramm ORDER BY wert   ;   v_ergebnis
tp_histogramm;
BEGIN   EXECUTE c_drop;   EXECUTE c_create;   FOR v_wert IN p_von .. p_bis LOOP       INSERT INTO ttb_histogramm( wert,
anzahl) VALUES ( v_wert, 0 );   END LOOP;   FOR v_anzahl IN 1 .. p_anzahl LOOP       UPDATE ttb_histogramm
SET              anzahl = anzahl + 1           WHERE               wert = p_von + FLOOR( ( 1 + p_bis - p_von ) *
RANDOM())       ;   END LOOP;   FOR r_histogramm IN c_histogramm LOOP       v_ergebnis.wert := r_histogramm.wert;
v_ergebnis.absolut:= r_histogramm.anzahl;       v_ergebnis.relativ := ROUND( CAST( r_histogramm.anzahl AS NUMERIC ) 
/ p_anzahl, 5 );       RETURN NEXT v_ergebnis;   END LOOP;   EXECUTE c_drop;   RETURN;
END;
$body$;

When calling
SELECT SUM( absolut ) FROM fn_zufall();
I'd expect PostgreSQL giving me the number of iterations, in this case
10000. Similarly, SELECT SUM( absolut ) FROM fn_zufall( p_anzahl := 100 );
should always give 100. Unfortunately this is not the case. When calling it
multiple times, it returns numbers smaller and greater, and always different
values for each call. Very strange (to me)!!!

When I change my function a little bit, writing the random number into an
INTEGER variable, and using this variable within my UPDATE statement,
everything works fine:

CREATE OR REPLACE FUNCTION fn_zufall(   p_von    INTEGER DEFAULT 1,   p_bis    INTEGER DEFAULT 6,   p_anzahl INTEGER
DEFAULT10000
 
)
RETURNS SETOF tp_histogramm
LANGUAGE PLpgSQL
AS $body$
DECLARE   c_create     CONSTANT VARCHAR := 'CREATE TEMPORARY TABLE IF NOT EXISTS
ttb_histogramm ( wert INTEGER, anzahl INTEGER ) ON COMMIT DROP';   c_drop       CONSTANT VARCHAR := 'DROP TABLE IF
EXISTSttb_histogramm 
CASCADE';   c_histogramm CURSOR FOR       SELECT * FROM ttb_histogramm ORDER BY wert   ;   v_ergebnis
tp_histogramm;  v_zufall              INTEGER;
 
BEGIN   EXECUTE c_drop;   EXECUTE c_create;   FOR v_wert IN p_von .. p_bis LOOP       INSERT INTO ttb_histogramm( wert,
anzahl) VALUES ( v_wert, 0 );   END LOOP;   FOR v_anzahl IN 1 .. p_anzahl LOOP       v_zufall := p_von + FLOOR( ( 1 +
p_bis- p_von ) * RANDOM() );       UPDATE ttb_histogramm           SET               anzahl = anzahl + 1
WHERE              wert = v_zufall       ;   END LOOP;   FOR r_histogramm IN c_histogramm LOOP       v_ergebnis.wert :=
r_histogramm.wert;      v_ergebnis.absolut := r_histogramm.anzahl;       v_ergebnis.relativ := ROUND( CAST(
r_histogramm.anzahlAS NUMERIC ) 
/ p_anzahl, 5 );       RETURN NEXT v_ergebnis;   END LOOP;   EXECUTE c_drop;   RETURN;
END;
$body$;

Now, SELECTs like the ones mentioned above work as expected.

I tried PostgreSQL in various versions (including 10beta1) and on several
versions of Debian GNU/Linux, but the result is always the same. Not using
"ON COMMIT DROP" doesn't help. Also, I tried to avoid the EXECUTE statements
within PL/pgSQL by creating the (temporary or "normal") table outside the
function before using the stored procedure, but no luck.

So, I believe this behaviour to be a bug within PostgreSQL, isn't it?

Cheers,
Frank.

P.S.: Thanks to Benjamin Mertens, one of my students, who discovered the
problem.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
"David G. Johnston"
Дата:
On Fri, Jul 7, 2017 at 9:10 AM, <frank.von.postgresql.org@familie-gard.de> wrote:
Bug reference:      14737
Logged by:          Frank Gard
Email address:      frank.von.postgresql.org@familie-gard.de
PostgreSQL version: 9.6.3
Operating system:   Debian GNU/Linux

​Not a bug - RANDOM() is a volatile function so it gets evaluated once per row.

        UPDATE ttb_histogramm
            SET
                anzahl = anzahl + 1
            WHERE
                wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )

Unfortunately this is not the case. When calling it
multiple times, it returns numbers smaller and greater, and always different
values for each call. Very strange (to me)!!!

​Which is the symptom one will see if, for every row, the value of random is different.
When I change my function a little bit, writing the random number into an
INTEGER variable, and using this variable within my UPDATE statement,
everything works fine:

​Then this is what you should do.

​David J.

Re: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
Frank Gard
Дата:
Hi David,

thanks for your instant reply. Unfortunately, I disagree your rating that
this behaviour is not a bug.

The problem here is not that RANDOM() has different values for each row.
This is exactly the expected behaviour. But if I do a numerical FOR loop
iterating the range from 1 to p_anzahl, and every time I do an UPDATE
which increments a "counter", then after the loop, the counter should
equal to p_anzahl (and nothing else).

My function simulates throwing the dice for p_anzahl times, counting how
often each of the dots you have as the result, and returns the distribution
of the results (… times 1 dot, … times 2 dots, and so on) as a table.

In addition, the result should not differ at all between the two variants of
the function. What makes, semantically, the difference? There's none at all!

For your convenience, here I add the diff between the two variants:
--- variant1.sql        2017-07-07 22:40:44.308024705 +0200
+++ variant2.sql        2017-07-07 22:40:36.975977660 +0200
@@ -13,6 +13,7 @@        SELECT * FROM ttb_histogramm ORDER BY wert    ;    v_ergebnis            tp_histogramm;
+    v_zufall              INTEGER;BEGIN    EXECUTE c_drop;    EXECUTE c_create;
@@ -20,11 +21,12 @@        INSERT INTO ttb_histogramm( wert, anzahl ) VALUES ( v_wert, 0 );    END LOOP;    FOR
v_anzahlIN 1 .. p_anzahl LOOP 
+        v_zufall := p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() );        UPDATE ttb_histogramm            SET
          anzahl = anzahl + 1            WHERE 
-                wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )
+                wert = v_zufall        ;    END LOOP;    FOR r_histogramm IN c_histogramm LOOP


Thanks a lot,
Frank.


Am 07.07.2017 um 18:28 schrieb David G. Johnston:
> On Fri, Jul 7, 2017 at 9:10 AM, <frank.von.postgresql.org@familie-gard.de
<mailto:frank.von.postgresql.org@familie-gard.de>>wrote:
>
>     Bug reference:      14737
>     Logged by:          Frank Gard
>     Email address:      frank.von.postgresql.org@familie-gard.de <mailto:frank.von.postgresql.org@familie-gard.de>
>     PostgreSQL version: 9.6.3
>     Operating system:   Debian GNU/Linux
>
>
> ​Not a bug - RANDOM() is a volatile function so it gets evaluated once per row.
>
>             UPDATE ttb_histogramm
>                 SET
>                     anzahl = anzahl + 1
>                 WHERE
>                     wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )
>
>
>     Unfortunately this is not the case. When calling it
>     multiple times, it returns numbers smaller and greater, and always different
>     values for each call. Very strange (to me)!!!
>
>
> ​Which is the symptom one will see if, for every row, the value of random is different.
> ​
>
>     When I change my function a little bit, writing the random number into an
>     INTEGER variable, and using this variable within my UPDATE statement,
>     everything works fine:
>
>
> ​Then this is what you should do.
>
> ​David J.
>




Re: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
"David G. Johnston"
Дата:
On Fri, Jul 7, 2017 at 1:49 PM, Frank Gard <frank@familie-gard.de> wrote:
In addition, the result should not differ at all between the two variants of
the function. What makes, semantically, the difference? There's none at all!

     FOR v_anzahl IN 1 .. p_anzahl LOOP
​[...]​

-                wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )
+                wert = v_zufall

SELECT gs, random()
FROM generate_series(1,10) ser (gs)

The reason this is not a bug is for the same reason the value of random() in the above query is different for every row.

David J.

Re: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
Tom Lane
Дата:
Frank Gard <frank@familie-gard.de> writes:
> The problem here is not that RANDOM() has different values for each row.
> This is exactly the expected behaviour. But if I do a numerical FOR loop
> iterating the range from 1 to p_anzahl, and every time I do an UPDATE
> which increments a "counter", then after the loop, the counter should
> equal to p_anzahl (and nothing else).

The flaw in that argument is the assumption that each execution of UPDATE
updates exactly one row.  But since RANDOM() is recomputed at each row,
what you really have is a stochastic decision whether to update that row,
and so the UPDATE could well update more or fewer than one row overall.

> In addition, the result should not differ at all between the two variants of
> the function. What makes, semantically, the difference? There's none at all!

The point is exactly that a WHERE condition is (notionally) computed at
each table row, and that includes re-evaluating any functions it calls.
The DBMS can often optimize away some of that computation, but in this
case it cannot because of the volatility of RANDOM().

The SQL standard is pretty explicit that this is the intended semantics
--- it says "SC is effectively evaluated for each row of T".  (In the 2011
edition, this is in 14.14 <update statement: searched>, general rule 5.)
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour

От
Frank Gard
Дата:
Hello Tom, hello David,

thanks a lot for your explanations and your patience. You're right, of course.
Sometimes, one cannot see the obvious things anymore :) .

You all do a great job!

Cheers,
Frank.