Обсуждение: some random() clarification needed

Поиск
Список
Период
Сортировка

some random() clarification needed

От
Marc Millas
Дата:
Hi,

when, in psql, on a postgres 12.3, I write:
select ceiling(random()*2582);
it does provide the expected answer, ie. a number between 1 and 2582, inclusive.
allways.
when I decide to use this to get a random row within a table prenoms having 2 columns
a id serial, and a prenom varchar, with explicitly 2582 lines, no gaps,
I write:
select id, prenom from prenoms where id=ceiling(random()*2582);

expecting to get, allways, one line.
But its not the case.
around 15% of time I get 0 lines which is already quite strange to me.
but 10% of time, I get a random number of lines, until now up to 4.
even weirder (to me !)

so, can someone please clarify ?

thanks,
regards,


Marc MILLAS
Senior Architect
+33607850334

Re: some random() clarification needed

От
"David G. Johnston"
Дата:
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:
select id, prenom from prenoms where id=ceiling(random()*2582);

expecting to get, allways, one line.
But its not the case.
around 15% of time I get 0 lines which is already quite strange to me.
but 10% of time, I get a random number of lines, until now up to 4.
even weirder (to me !)

so, can someone please clarify ?


You are basically asking:

For each row in my table compare the id to some random number and if they match return that row, otherwise skip it.  The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.

David J.

Re: some random() clarification needed

От
Marc Millas
Дата:
Ok, thanks for the clarification. 

Marc MILLAS
Senior Architect
+33607850334



On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:
select id, prenom from prenoms where id=ceiling(random()*2582);

expecting to get, allways, one line.
But its not the case.
around 15% of time I get 0 lines which is already quite strange to me.
but 10% of time, I get a random number of lines, until now up to 4.
even weirder (to me !)

so, can someone please clarify ?


You are basically asking:

For each row in my table compare the id to some random number and if they match return that row, otherwise skip it.  The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.

David J.

Re: some random() clarification needed

От
Marc Millas
Дата:
Hi,
your answer helps me understand my first problem.
so, I rewrote a simple loop so as to avoid the "volatile" behaviour.
(at least I was thinking I did... looks like I was wrong !)
step by step loop:
DO $$
BEGIN
  FOR counter IN 1..1000 LOOP
begin
declare
id1 integer =ceiling(random()*2582);
id3 date= '2000-01-01';
id2 date;
pren varchar;
begin
id2=id3 + (random()*7200)::integer;
SELECT prenom FROM prenoms WHERE id=id1 into pren;   
INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
  end;
end;
END LOOP;
END; $$;

I truncated the table, executed the loop with no errors, and expected that a select count(*) 
may answer 1000 !
no.
it varies, from less than 1000 (much less, something like 900)
and more than 1000 (up to 1094)

so... what s "volatile" in the loop ?

BTW the testparttransac table is partitioned on datenaissance, with a default partition.

thanks,
regards,


Marc MILLAS
Senior Architect
+33607850334



On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:
select id, prenom from prenoms where id=ceiling(random()*2582);

expecting to get, allways, one line.
But its not the case.
around 15% of time I get 0 lines which is already quite strange to me.
but 10% of time, I get a random number of lines, until now up to 4.
even weirder (to me !)

so, can someone please clarify ?


You are basically asking:

For each row in my table compare the id to some random number and if they match return that row, otherwise skip it.  The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.

David J.

Re: some random() clarification needed

От
"David G. Johnston"
Дата:
Please don't top-post.  Inline (with trim) is better but at minimum bottom-post.

On Tue, Jul 14, 2020 at 9:01 AM Marc Millas <marc.millas@mokadb.com> wrote:
Hi,
your answer helps me understand my first problem.
so, I rewrote a simple loop so as to avoid the "volatile" behaviour.
(at least I was thinking I did... looks like I was wrong !)
step by step loop:
DO $$
BEGIN
  FOR counter IN 1..1000 LOOP
begin
declare
id1 integer =ceiling(random()*2582);
id3 date= '2000-01-01';
id2 date;
pren varchar;
begin
id2=id3 + (random()*7200)::integer;
SELECT prenom FROM prenoms WHERE id=id1 into pren;   
INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
  end;
end;
END LOOP;
END; $$;

I truncated the table, executed the loop with no errors, and expected that a select count(*) 
may answer 1000 !
no.
it varies, from less than 1000 (much less, something like 900)
and more than 1000 (up to 1094)

so... what s "volatile" in the loop ?

Everything...

You are setting id1 to the result of an expression inside the loop.  Everytime that statement gets executed within the loop a new random number is produced.

I mean, even "id 3 date = '2000-01-01'" is repeatedly casting (I think) the string to a date and assigning it to the variable even though that statement overall is effectively immutable.

David J.

Re: some random() clarification needed

От
Adrian Klaver
Дата:
On 7/14/20 9:01 AM, Marc Millas wrote:
> Hi,
> your answer helps me understand my first problem.
> so, I rewrote a simple loop so as to avoid the "volatile" behaviour.
> (at least I was thinking I did... looks like I was wrong !)
> step by step loop:
> DO $$
> BEGIN
>    FOR counter IN 1..1000 LOOP
> begin
> declare
> id1 integer =ceiling(random()*2582);
> id3 date= '2000-01-01';
> id2 date;
> pren varchar;
> begin
> id2=id3 + (random()*7200)::integer;
> SELECT prenom FROM prenoms WHERE id=id1 into pren;
> INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
>    end;
> end;
> END LOOP;
> END; $$;
> 
> I truncated the table, executed the loop with no errors, and expected 
> that a select count(*)
> may answer 1000 !
> no.
> it varies, from less than 1000 (much less, something like 900)
> and more than 1000 (up to 1094)
> 
> so... what s "volatile" in the loop ?

I think it has more to do with the structure of the function. Not sure 
how all those nested BEGINs interact, but when I simplify the above to:

DO $$

DECLARE
     id1 integer =ceiling(random()*2582);
     id3 date= '2000-01-01';
     id2 date;
     pren varchar = 'test';
BEGIN

FOR counter IN 1..1000 LOOP
     id2 = id3 + (random()*7200)::integer;
     INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
END LOOP;
END; $$;

I get 1000 rows each time I TRUNCATE testparttransac and then run above 
and repeat.

> 
> BTW the testparttransac table is partitioned on datenaissance, with a 
> default partition.
> 
> thanks,
> regards,
> 
> 
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com <http://www.mokadb.com>
> 
> 
> 
> On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston 
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
> 
>     On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com
>     <mailto:marc.millas@mokadb.com>> wrote:
> 
>         select id, prenom from prenoms where id=ceiling(random()*2582);
> 
>         expecting to get, allways, one line.
>         But its not the case.
>         around 15% of time I get 0 lines which is already quite strange
>         to me.
>         but 10% of time, I get a random number of lines, until now up to 4.
>         even weirder (to me !)
> 
>         so, can someone please clarify ?
> 
> 
>     You are basically asking:
> 
>     For each row in my table compare the id to some random number and if
>     they match return that row, otherwise skip it.  The random number
>     being compared to is different for each row because random() is
>     volatile and thus evaluated for each row.
> 
>     David J.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: some random() clarification needed

От
David Rowley
Дата:
On Wed, 15 Jul 2020 at 04:01, Marc Millas <marc.millas@mokadb.com> wrote:
> your answer helps me understand my first problem.
> so, I rewrote a simple loop so as to avoid the "volatile" behaviour.

Not sure what you're trying to do with the plpgsql, but you can just
escape the multiple evaluations by putting the volatile function in a
sub-select with no FROM clause.

SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582));

Or the more traditional way to get a random row is:

SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1;

David



Re: some random() clarification needed

От
Marc Millas
Дата:
Thanks! 
makes it clearer :-)
its not that obvious to guess the consequences of the "volatile" behaviour.
regards,


Marc MILLAS
Senior Architect
+33607850334



On Wed, Jul 15, 2020 at 1:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 15 Jul 2020 at 04:01, Marc Millas <marc.millas@mokadb.com> wrote:
> your answer helps me understand my first problem.
> so, I rewrote a simple loop so as to avoid the "volatile" behaviour.

Not sure what you're trying to do with the plpgsql, but you can just
escape the multiple evaluations by putting the volatile function in a
sub-select with no FROM clause.

SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582));

Or the more traditional way to get a random row is:

SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1;

David