Обсуждение: [BUGS] BUG #14737: Wrong PL/pgSQL behaviour
The following bug has been logged on the website: Bug reference: 14737 Logged by: Frank Gard Email address: frank.von.postgresql.org@familie-gard.de PostgreSQL version: 9.6.3 Operating system: Debian GNU/Linux Description: As an example for my database programming lecture, I invented the following PL/pgSQL function: DROP TYPE IF EXISTS tp_histogramm CASCADE; CREATE TYPE tp_histogramm AS ( wert INTEGER, absolut INTEGER, relativ DECIMAL( 7, 5 ) ); -- Variante 1 (funktioniert leider nicht korrekt, siehe Erläuterung im unten stehenden Kommentar): CREATE OR REPLACE FUNCTION fn_zufall( p_von INTEGER DEFAULT 1, p_bis INTEGER DEFAULT 6, p_anzahl INTEGER DEFAULT10000 ) RETURNS SETOF tp_histogramm LANGUAGE PLpgSQL AS $body$ DECLARE c_create CONSTANT VARCHAR := 'CREATE TEMPORARY TABLE IF NOT EXISTS ttb_histogramm ( wert INTEGER, anzahl INTEGER ) ON COMMIT DROP'; c_drop CONSTANT VARCHAR := 'DROP TABLE IF EXISTSttb_histogramm CASCADE'; c_histogramm CURSOR FOR SELECT * FROM ttb_histogramm ORDER BY wert ; v_ergebnis tp_histogramm; BEGIN EXECUTE c_drop; EXECUTE c_create; FOR v_wert IN p_von .. p_bis LOOP INSERT INTO ttb_histogramm( wert, anzahl) VALUES ( v_wert, 0 ); END LOOP; FOR v_anzahl IN 1 .. p_anzahl LOOP UPDATE ttb_histogramm SET anzahl = anzahl + 1 WHERE wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM()) ; END LOOP; FOR r_histogramm IN c_histogramm LOOP v_ergebnis.wert := r_histogramm.wert; v_ergebnis.absolut:= r_histogramm.anzahl; v_ergebnis.relativ := ROUND( CAST( r_histogramm.anzahl AS NUMERIC ) / p_anzahl, 5 ); RETURN NEXT v_ergebnis; END LOOP; EXECUTE c_drop; RETURN; END; $body$; When calling SELECT SUM( absolut ) FROM fn_zufall(); I'd expect PostgreSQL giving me the number of iterations, in this case 10000. Similarly, SELECT SUM( absolut ) FROM fn_zufall( p_anzahl := 100 ); should always give 100. Unfortunately this is not the case. When calling it multiple times, it returns numbers smaller and greater, and always different values for each call. Very strange (to me)!!! When I change my function a little bit, writing the random number into an INTEGER variable, and using this variable within my UPDATE statement, everything works fine: CREATE OR REPLACE FUNCTION fn_zufall( p_von INTEGER DEFAULT 1, p_bis INTEGER DEFAULT 6, p_anzahl INTEGER DEFAULT10000 ) RETURNS SETOF tp_histogramm LANGUAGE PLpgSQL AS $body$ DECLARE c_create CONSTANT VARCHAR := 'CREATE TEMPORARY TABLE IF NOT EXISTS ttb_histogramm ( wert INTEGER, anzahl INTEGER ) ON COMMIT DROP'; c_drop CONSTANT VARCHAR := 'DROP TABLE IF EXISTSttb_histogramm CASCADE'; c_histogramm CURSOR FOR SELECT * FROM ttb_histogramm ORDER BY wert ; v_ergebnis tp_histogramm; v_zufall INTEGER; BEGIN EXECUTE c_drop; EXECUTE c_create; FOR v_wert IN p_von .. p_bis LOOP INSERT INTO ttb_histogramm( wert, anzahl) VALUES ( v_wert, 0 ); END LOOP; FOR v_anzahl IN 1 .. p_anzahl LOOP v_zufall := p_von + FLOOR( ( 1 + p_bis- p_von ) * RANDOM() ); UPDATE ttb_histogramm SET anzahl = anzahl + 1 WHERE wert = v_zufall ; END LOOP; FOR r_histogramm IN c_histogramm LOOP v_ergebnis.wert := r_histogramm.wert; v_ergebnis.absolut := r_histogramm.anzahl; v_ergebnis.relativ := ROUND( CAST( r_histogramm.anzahlAS NUMERIC ) / p_anzahl, 5 ); RETURN NEXT v_ergebnis; END LOOP; EXECUTE c_drop; RETURN; END; $body$; Now, SELECTs like the ones mentioned above work as expected. I tried PostgreSQL in various versions (including 10beta1) and on several versions of Debian GNU/Linux, but the result is always the same. Not using "ON COMMIT DROP" doesn't help. Also, I tried to avoid the EXECUTE statements within PL/pgSQL by creating the (temporary or "normal") table outside the function before using the stored procedure, but no luck. So, I believe this behaviour to be a bug within PostgreSQL, isn't it? Cheers, Frank. P.S.: Thanks to Benjamin Mertens, one of my students, who discovered the problem. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Bug reference: 14737
Logged by: Frank Gard
Email address: frank.von.postgresql.org@familie-gard.de
PostgreSQL version: 9.6.3
Operating system: Debian GNU/Linux
Not a bug - RANDOM() is a volatile function so it gets evaluated once per row.
UPDATE ttb_histogramm
SET
anzahl = anzahl + 1
WHERE
wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )
Unfortunately this is not the case. When calling it
multiple times, it returns numbers smaller and greater, and always different
values for each call. Very strange (to me)!!!
Which is the symptom one will see if, for every row, the value of random is different.
When I change my function a little bit, writing the random number into an
INTEGER variable, and using this variable within my UPDATE statement,
everything works fine:
Then this is what you should do.
David J.
Hi David, thanks for your instant reply. Unfortunately, I disagree your rating that this behaviour is not a bug. The problem here is not that RANDOM() has different values for each row. This is exactly the expected behaviour. But if I do a numerical FOR loop iterating the range from 1 to p_anzahl, and every time I do an UPDATE which increments a "counter", then after the loop, the counter should equal to p_anzahl (and nothing else). My function simulates throwing the dice for p_anzahl times, counting how often each of the dots you have as the result, and returns the distribution of the results (… times 1 dot, … times 2 dots, and so on) as a table. In addition, the result should not differ at all between the two variants of the function. What makes, semantically, the difference? There's none at all! For your convenience, here I add the diff between the two variants: --- variant1.sql 2017-07-07 22:40:44.308024705 +0200 +++ variant2.sql 2017-07-07 22:40:36.975977660 +0200 @@ -13,6 +13,7 @@ SELECT * FROM ttb_histogramm ORDER BY wert ; v_ergebnis tp_histogramm; + v_zufall INTEGER;BEGIN EXECUTE c_drop; EXECUTE c_create; @@ -20,11 +21,12 @@ INSERT INTO ttb_histogramm( wert, anzahl ) VALUES ( v_wert, 0 ); END LOOP; FOR v_anzahlIN 1 .. p_anzahl LOOP + v_zufall := p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() ); UPDATE ttb_histogramm SET anzahl = anzahl + 1 WHERE - wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() ) + wert = v_zufall ; END LOOP; FOR r_histogramm IN c_histogramm LOOP Thanks a lot, Frank. Am 07.07.2017 um 18:28 schrieb David G. Johnston: > On Fri, Jul 7, 2017 at 9:10 AM, <frank.von.postgresql.org@familie-gard.de <mailto:frank.von.postgresql.org@familie-gard.de>>wrote: > > Bug reference: 14737 > Logged by: Frank Gard > Email address: frank.von.postgresql.org@familie-gard.de <mailto:frank.von.postgresql.org@familie-gard.de> > PostgreSQL version: 9.6.3 > Operating system: Debian GNU/Linux > > > Not a bug - RANDOM() is a volatile function so it gets evaluated once per row. > > UPDATE ttb_histogramm > SET > anzahl = anzahl + 1 > WHERE > wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() ) > > > Unfortunately this is not the case. When calling it > multiple times, it returns numbers smaller and greater, and always different > values for each call. Very strange (to me)!!! > > > Which is the symptom one will see if, for every row, the value of random is different. > > > When I change my function a little bit, writing the random number into an > INTEGER variable, and using this variable within my UPDATE statement, > everything works fine: > > > Then this is what you should do. > > David J. >
In addition, the result should not differ at all between the two variants of
the function. What makes, semantically, the difference? There's none at all!
FOR v_anzahl IN 1 .. p_anzahl LOOP[...]
- wert = p_von + FLOOR( ( 1 + p_bis - p_von ) * RANDOM() )
+ wert = v_zufall
SELECT gs, random()
FROM generate_series(1,10) ser (gs)
The reason this is not a bug is for the same reason the value of random() in the above query is different for every row.
David J.
Frank Gard <frank@familie-gard.de> writes: > The problem here is not that RANDOM() has different values for each row. > This is exactly the expected behaviour. But if I do a numerical FOR loop > iterating the range from 1 to p_anzahl, and every time I do an UPDATE > which increments a "counter", then after the loop, the counter should > equal to p_anzahl (and nothing else). The flaw in that argument is the assumption that each execution of UPDATE updates exactly one row. But since RANDOM() is recomputed at each row, what you really have is a stochastic decision whether to update that row, and so the UPDATE could well update more or fewer than one row overall. > In addition, the result should not differ at all between the two variants of > the function. What makes, semantically, the difference? There's none at all! The point is exactly that a WHERE condition is (notionally) computed at each table row, and that includes re-evaluating any functions it calls. The DBMS can often optimize away some of that computation, but in this case it cannot because of the volatility of RANDOM(). The SQL standard is pretty explicit that this is the intended semantics --- it says "SC is effectively evaluated for each row of T". (In the 2011 edition, this is in 14.14 <update statement: searched>, general rule 5.) regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hello Tom, hello David, thanks a lot for your explanations and your patience. You're right, of course. Sometimes, one cannot see the obvious things anymore :) . You all do a great job! Cheers, Frank.