Обсуждение: plpython does not honour max-rows
Hi
I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows. What actually happens is that zero is ignored for max-rows and all rows are returned. Test case below:-
Many thanks
Kieran
drop table if exists _test_max_rows;
create table _test_max_rows (i integer);
insert into _test_max_rows
select *
from generate_series(1, 100);
create or replace function plpython3u_execute_max_row(max_rows integer)
returns setof integer
language plpython3u
as $$
for row in plpy.execute('select * from _test_max_rows', max_rows):
yield row['i']
$$;
-- Correctly returns 10 rows
select * from plpython3u_execute_max_row(10);
-- Incorrectly returns all 100 rows
select * from plpython3u_execute_max_row(0)
create table _test_max_rows (i integer);
insert into _test_max_rows
select *
from generate_series(1, 100);
create or replace function plpython3u_execute_max_row(max_rows integer)
returns setof integer
language plpython3u
as $$
for row in plpy.execute('select * from _test_max_rows', max_rows):
yield row['i']
$$;
-- Correctly returns 10 rows
select * from plpython3u_execute_max_row(10);
-- Incorrectly returns all 100 rows
select * from plpython3u_execute_max_row(0)
> On 2 May 2023, at 12:30, Kieran McCusker <kieran.mccusker@gmail.com> wrote: > I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows. Whatactually happens is that zero is ignored for max-rows and all rows are returned. A max_rows of less than or equal to zero is IIRC interpreted as "fetch all rows". I think this works as intended, is it documented anywhere to work in another way? -- Daniel Gustafsson
Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there is nothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows?
Anyway it was surprising gotcha, but of course easy to code around.
Kieran
On Tue, 2 May 2023, 12:01 Daniel Gustafsson, <daniel@yesql.se> wrote:
> On 2 May 2023, at 12:30, Kieran McCusker <kieran.mccusker@gmail.com> wrote:
> I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows. What actually happens is that zero is ignored for max-rows and all rows are returned.
A max_rows of less than or equal to zero is IIRC interpreted as "fetch all
rows". I think this works as intended, is it documented anywhere to work in
another way?
--
Daniel Gustafsson
> On 2 May 2023, at 13:37, Kieran McCusker <kieran.mccusker@gmail.com> wrote: > Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there isnothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows? That does sound like something which we should document, the confusion is easy to see. Thanks for the report. FTR I think I misremembered in my earlier email, it's == 0 and not <= 0 which implies to limit. -- Daniel Gustafsson
Without making too much of a fuss, wouldn't it be simpler to honour a row-limit of zero rather than document that it doesn't work?
On Tue, 2 May 2023 at 12:48, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 2 May 2023, at 13:37, Kieran McCusker <kieran.mccusker@gmail.com> wrote:
> Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there is nothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows?
That does sound like something which we should document, the confusion is easy
to see. Thanks for the report.
FTR I think I misremembered in my earlier email, it's == 0 and not <= 0 which
implies to limit.
--
Daniel Gustafsson
Kieran McCusker <kieran.mccusker@gmail.com> writes: > Without making too much of a fuss, wouldn't it be simpler to honour a > row-limit of zero rather than document that it doesn't work? plpy.execute is a thin wrapper around SPI_execute, which does document this point: If <parameter>count</parameter> is zero then the command is executed for all rows that it applies to. If <parameter>count</parameter> is greater than zero, then no more than <parameter>count</parameter> rows will be retrieved; execution stops when the count is reached, much like adding a <literal>LIMIT</literal> clause to the query. Since that's stood for a few decades now, changing it seems impossible from the backwards-compatibility standpoint. However, it does seem appropriate to repeat that material in the wrapper's documentation. I wonder whether the similar plperl and pltcl wrappers are also documentation-shy here. regards, tom lane
> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Since that's stood for a few decades now, changing it seems impossible > from the backwards-compatibility standpoint. Agreed, that's a non-starter. > However, it does seem > appropriate to repeat that material in the wrapper's documentation. > > I wonder whether the similar plperl and pltcl wrappers are also > documentation-shy here. It seems like they are all a bit thin on explaining this. The attached diff copies the wording (which unsurprisingly is pretty good IMO) into the plperl/python/tcl documentation. -- Daniel Gustafsson
Вложения
I wrote: > Since that's stood for a few decades now, changing it seems impossible > from the backwards-compatibility standpoint. However, it does seem > appropriate to repeat that material in the wrapper's documentation. > I wonder whether the similar plperl and pltcl wrappers are also > documentation-shy here. Indeed so. The underlying SPI documentation is solid enough on this point, but the PLs are all misleading, in that they suggest the limit arguments work like "LIMIT n" or "FETCH n", which isn't quite so. I suggest the attached docs patch. regards, tom lane diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml index 6c81ee8fbe..8ab9602249 100644 --- a/doc/src/sgml/plperl.sgml +++ b/doc/src/sgml/plperl.sgml @@ -441,7 +441,7 @@ use strict; <variablelist> <varlistentry> <term> - <literal><function>spi_exec_query</function>(<replaceable>query</replaceable> [, <replaceable>max-rows</replaceable>])</literal> + <literal><function>spi_exec_query</function>(<replaceable>query</replaceable> [, <replaceable>limit</replaceable>])</literal> <indexterm> <primary>spi_exec_query</primary> <secondary>in PL/Perl</secondary> @@ -449,9 +449,14 @@ use strict; </term> <listitem> <para> - <literal>spi_exec_query</literal> executes an SQL command and -returns the entire row set as a reference to an array of hash -references. <emphasis>You should only use this command when you know + <function>spi_exec_query</function> executes an SQL command and +returns the entire row set as a reference to an array of hash references. +If <replaceable>limit</replaceable> is specified, +then <function>spi_exec_query</function> retrieves at +most <replaceable>limit</replaceable> rows, much as if the query included +a <literal>LIMIT</literal> clause. Omitting <literal>limit</literal> or +specifying it as zero results in no row limit. +<emphasis>You should only use this command when you know that the result set will be relatively small.</emphasis> Here is an example of a query (<command>SELECT</command> command) with the optional maximum number of rows: @@ -643,7 +648,10 @@ $plan = spi_prepare('SELECT * FROM test WHERE id > $1 AND name = $2', by <literal>spi_exec_query</literal>, or in <literal>spi_query_prepared</literal> which returns a cursor exactly as <literal>spi_query</literal> does, which can be later passed to <literal>spi_fetchrow</literal>. The optional second parameter to <literal>spi_exec_prepared</literal> is a hash reference of attributes; - the only attribute currently supported is <literal>limit</literal>, which sets the maximum number of rows returned bya query. + the only attribute currently supported is <literal>limit</literal>, which + sets the maximum number of rows returned from the query. + Omitting <literal>limit</literal> or specifying it as zero results in no + row limit. </para> <para> diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml index e190c90f45..2dd6ea6b32 100644 --- a/doc/src/sgml/plpython.sgml +++ b/doc/src/sgml/plpython.sgml @@ -789,7 +789,7 @@ $$ LANGUAGE plpython3u; <variablelist> <varlistentry> - <term><literal>plpy.<function>execute</function>(<replaceable>query</replaceable> [, <replaceable>max-rows</replaceable>])</literal></term> + <term><literal>plpy.<function>execute</function>(<replaceable>query</replaceable> [, <replaceable>limit</replaceable>])</literal></term> <listitem> <para> Calling <function>plpy.execute</function> with a query string and an @@ -797,6 +797,15 @@ $$ LANGUAGE plpython3u; be returned in a result object. </para> + <para> + If <replaceable>limit</replaceable> is specified, + then <function>plpy.execute</function> retrieves at + most <replaceable>limit</replaceable> rows, much as if the query + included a <literal>LIMIT</literal> + clause. Omitting <literal>limit</literal> or specifying it as zero + results in no row limit. + </para> + <para> The result object emulates a list or dictionary object. The result object can be accessed by row number and column name. For example: @@ -887,7 +896,7 @@ foo = rv[i]["my_column"] <varlistentry> <term><literal>plpy.<function>prepare</function>(<replaceable>query</replaceable> [, <replaceable>argtypes</replaceable>])</literal></term> - <term><literal>plpy.<function>execute</function>(<replaceable>plan</replaceable> [, <replaceable>arguments</replaceable>[, <replaceable>max-rows</replaceable>]])</literal></term> + <term><literal>plpy.<function>execute</function>(<replaceable>plan</replaceable> [, <replaceable>arguments</replaceable>[, <replaceable>limit</replaceable>]])</literal></term> <listitem> <para> <indexterm><primary>preparing a query</primary><secondary>in PL/Python</secondary></indexterm> diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml index bf56ba6b1c..b31f2c1330 100644 --- a/doc/src/sgml/pltcl.sgml +++ b/doc/src/sgml/pltcl.sgml @@ -341,9 +341,11 @@ $$ LANGUAGE pltcl; </para> <para> The optional <literal>-count</literal> value tells - <function>spi_exec</function> the maximum number of rows - to process in the command. The effect of this is comparable to - setting up a query as a cursor and then saying <literal>FETCH <replaceable>n</replaceable></literal>. + <function>spi_exec</function> to stop + once <replaceable>n</replaceable> rows have been retrieved, + much as if the query included a <literal>LIMIT</literal> clause. + If <replaceable>n</replaceable> is zero, the query is run to + completion, the same as when <literal>-count</literal> is omitted. </para> <para> If the command is a <command>SELECT</command> statement, the values of the
Daniel Gustafsson <daniel@yesql.se> writes: >> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder whether the similar plperl and pltcl wrappers are also >> documentation-shy here. > It seems like they are all a bit thin on explaining this. The attached diff > copies the wording (which unsurprisingly is pretty good IMO) into the > plperl/python/tcl documentation. Ah, seems like we set to work on this at the same time :-( I thought that s/max-rows/limit/ would be a good idea, mainly because plperl's spi_exec_prepared uses that name as a caller-exposed hash key. I'm not especially concerned about the wording otherwise. regards, tom lane
> On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >>> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder whether the similar plperl and pltcl wrappers are also >>> documentation-shy here. > >> It seems like they are all a bit thin on explaining this. The attached diff >> copies the wording (which unsurprisingly is pretty good IMO) into the >> plperl/python/tcl documentation. > > Ah, seems like we set to work on this at the same time :-( Pretty impressive timing across timezones =) > I thought that s/max-rows/limit/ would be a good idea, I was actually thinking about that but backed off to not confuse things with LIMIT. > mainly because > plperl's spi_exec_prepared uses that name as a caller-exposed hash key. But I didn't realize that, and in light of that I agree that limit is better. > I'm not especially concerned about the wording otherwise. Neither am I, both are fine I think. -- Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> mainly because >> plperl's spi_exec_prepared uses that name as a caller-exposed hash key. > But I didn't realize that, and in light of that I agree that limit is better. OK, let's do it like that then. >> I'm not especially concerned about the wording otherwise. > Neither am I, both are fine I think. I'll compare and merge the two patches and push. Thanks for looking at it! regards, tom lane
> On 2 May 2023, at 23:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >> On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> mainly because >>> plperl's spi_exec_prepared uses that name as a caller-exposed hash key. > >> But I didn't realize that, and in light of that I agree that limit is better. > > OK, let's do it like that then. > >>> I'm not especially concerned about the wording otherwise. > >> Neither am I, both are fine I think. > > I'll compare and merge the two patches and push. Thanks for > looking at it! +1, sounds like a good plan. -- Daniel Gustafsson