Обсуждение: plpython does not honour max-rows

Поиск

Список

Период

Сортировка

plpython does not honour max-rows

От

Kieran McCusker

Дата:

02 мая 2023 г., 13:30:16

I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows. What actually happens is that zero is ignored for max-rows and all rows are returned. Test case below:-

Many thanks

Kieran

drop table if exists _test_max_rows;
create table _test_max_rows (i integer);
insert into _test_max_rows
select *
from generate_series(1, 100);

create or replace function plpython3u_execute_max_row(max_rows integer)
returns setof integer
language plpython3u
as $$
for row in plpy.execute('select * from _test_max_rows', max_rows):
yield row['i']
$$;
-- Correctly returns 10 rows
select * from plpython3u_execute_max_row(10);

-- Incorrectly returns all 100 rows
select * from plpython3u_execute_max_row(0)

Re: plpython does not honour max-rows

От

Daniel Gustafsson

Дата:

02 мая 2023 г., 14:01:00

> On 2 May 2023, at 12:30, Kieran McCusker <kieran.mccusker@gmail.com> wrote:

> I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows.
Whatactually happens is that zero is ignored for max-rows and all rows are returned. 

A max_rows of less than or equal to zero is IIRC interpreted as "fetch all
rows".  I think this works as intended, is it documented anywhere to work in
another way?

--
Daniel Gustafsson

Re: plpython does not honour max-rows

От

Kieran McCusker

Дата:

02 мая 2023 г., 14:37:00

Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there is nothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows?

Anyway it was surprising gotcha, but of course easy to code around.

Kieran

On Tue, 2 May 2023, 12:01 Daniel Gustafsson, <daniel@yesql.se> wrote:

> On 2 May 2023, at 12:30, Kieran McCusker <kieran.mccusker@gmail.com> wrote:

> I came across this when developing a sampling function using plpy.execute that needs to be able to sample zero rows. What actually happens is that zero is ignored for max-rows and all rows are returned.

A max_rows of less than or equal to zero is IIRC interpreted as "fetch all
rows". I think this works as intended, is it documented anywhere to work in
another way?

--
Daniel Gustafsson

Re: plpython does not honour max-rows

От

Daniel Gustafsson

Дата:

02 мая 2023 г., 14:48:46

> On 2 May 2023, at 13:37, Kieran McCusker <kieran.mccusker@gmail.com> wrote:

> Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there
isnothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows? 

That does sound like something which we should document, the confusion is easy
to see. Thanks for the report.

FTR I think I misremembered in my earlier email, it's == 0 and not <= 0 which
implies to limit.

--
Daniel Gustafsson

Re: plpython does not honour max-rows

От

Kieran McCusker

Дата:

02 мая 2023 г., 14:51:06

Without making too much of a fuss, wouldn't it be simpler to honour a row-limit of zero rather than document that it doesn't work?

On Tue, 2 May 2023 at 12:48, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 2 May 2023, at 13:37, Kieran McCusker <kieran.mccusker@gmail.com> wrote:

> Thanks for the quick response. Chapter 46.6.1 says that max-rows is an optional row limit. Unless I missed it there is nothing in the documentation about zero meaning all rows. Wouldn't it rather be like SQL LIMIT 0 meaning all rows?

That does sound like something which we should document, the confusion is easy
to see. Thanks for the report.

FTR I think I misremembered in my earlier email, it's == 0 and not <= 0 which
implies to limit.

--
Daniel Gustafsson

Re: plpython does not honour max-rows

От

Tom Lane

Дата:

02 мая 2023 г., 17:02:30

Kieran McCusker <kieran.mccusker@gmail.com> writes:
> Without making too much of a fuss, wouldn't it be simpler to honour a
> row-limit of zero rather than document that it doesn't work?

plpy.execute is a thin wrapper around SPI_execute, which does document
this point:

   If <parameter>count</parameter> is zero then the command is executed
   for all rows that it applies to.  If <parameter>count</parameter>
   is greater than zero, then no more than <parameter>count</parameter> rows
   will be retrieved; execution stops when the count is reached, much like
   adding a <literal>LIMIT</literal> clause to the query.

Since that's stood for a few decades now, changing it seems impossible
from the backwards-compatibility standpoint.  However, it does seem
appropriate to repeat that material in the wrapper's documentation.

I wonder whether the similar plperl and pltcl wrappers are also
documentation-shy here.

            regards, tom lane

Re: plpython does not honour max-rows

От

Daniel Gustafsson

Дата:

02 мая 2023 г., 23:34:04

> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Since that's stood for a few decades now, changing it seems impossible
> from the backwards-compatibility standpoint.

Agreed, that's a non-starter.

> However, it does seem
> appropriate to repeat that material in the wrapper's documentation.
> 
> I wonder whether the similar plperl and pltcl wrappers are also
> documentation-shy here.

It seems like they are all a bit thin on explaining this.  The attached diff
copies the wording (which unsurprisingly is pretty good IMO) into the
plperl/python/tcl documentation.

--
Daniel Gustafsson

Вложения

max_rows.diff

Re: plpython does not honour max-rows

От

Tom Lane

Дата:

02 мая 2023 г., 23:35:45

I wrote:
> Since that's stood for a few decades now, changing it seems impossible
> from the backwards-compatibility standpoint.  However, it does seem
> appropriate to repeat that material in the wrapper's documentation.
> I wonder whether the similar plperl and pltcl wrappers are also
> documentation-shy here.

Indeed so.  The underlying SPI documentation is solid enough on this
point, but the PLs are all misleading, in that they suggest the limit
arguments work like "LIMIT n" or "FETCH n", which isn't quite so.
I suggest the attached docs patch.

            regards, tom lane

diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml
index 6c81ee8fbe..8ab9602249 100644
--- a/doc/src/sgml/plperl.sgml
+++ b/doc/src/sgml/plperl.sgml
@@ -441,7 +441,7 @@ use strict;
    <variablelist>
     <varlistentry>
      <term>
-      <literal><function>spi_exec_query</function>(<replaceable>query</replaceable> [,
<replaceable>max-rows</replaceable>])</literal>
+      <literal><function>spi_exec_query</function>(<replaceable>query</replaceable> [,
<replaceable>limit</replaceable>])</literal>
       <indexterm>
        <primary>spi_exec_query</primary>
        <secondary>in PL/Perl</secondary>
@@ -449,9 +449,14 @@ use strict;
      </term>
      <listitem>
       <para>
-       <literal>spi_exec_query</literal> executes an SQL command and
-returns the entire row set as a reference to an array of hash
-references.  <emphasis>You should only use this command when you know
+       <function>spi_exec_query</function> executes an SQL command and
+returns the entire row set as a reference to an array of hash references.
+If <replaceable>limit</replaceable> is specified,
+then <function>spi_exec_query</function> retrieves at
+most <replaceable>limit</replaceable> rows, much as if the query included
+a <literal>LIMIT</literal> clause.  Omitting <literal>limit</literal> or
+specifying it as zero results in no row limit.
+<emphasis>You should only use this command when you know
 that the result set will be relatively small.</emphasis>  Here is an
 example of a query (<command>SELECT</command> command) with the
 optional maximum number of rows:
@@ -643,7 +648,10 @@ $plan = spi_prepare('SELECT * FROM test WHERE id > $1 AND name = $2',
     by <literal>spi_exec_query</literal>, or in <literal>spi_query_prepared</literal> which returns a cursor
     exactly as <literal>spi_query</literal> does, which can be later passed to <literal>spi_fetchrow</literal>.
     The optional second parameter to <literal>spi_exec_prepared</literal> is a hash reference of attributes;
-    the only attribute currently supported is <literal>limit</literal>, which sets the maximum number of rows returned
bya query. 
+    the only attribute currently supported is <literal>limit</literal>, which
+    sets the maximum number of rows returned from the query.
+    Omitting <literal>limit</literal> or specifying it as zero results in no
+    row limit.
     </para>

     <para>
diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index e190c90f45..2dd6ea6b32 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -789,7 +789,7 @@ $$ LANGUAGE plpython3u;

   <variablelist>
    <varlistentry>
-    <term><literal>plpy.<function>execute</function>(<replaceable>query</replaceable> [,
<replaceable>max-rows</replaceable>])</literal></term>
+    <term><literal>plpy.<function>execute</function>(<replaceable>query</replaceable> [,
<replaceable>limit</replaceable>])</literal></term>
     <listitem>
      <para>
       Calling <function>plpy.execute</function> with a query string and an
@@ -797,6 +797,15 @@ $$ LANGUAGE plpython3u;
       be returned in a result object.
      </para>

+     <para>
+      If <replaceable>limit</replaceable> is specified,
+      then <function>plpy.execute</function> retrieves at
+      most <replaceable>limit</replaceable> rows, much as if the query
+      included a <literal>LIMIT</literal>
+      clause. Omitting <literal>limit</literal> or specifying it as zero
+      results in no row limit.
+     </para>
+
      <para>
       The result object emulates a list or dictionary object.  The result
       object can be accessed by row number and column name.  For example:
@@ -887,7 +896,7 @@ foo = rv[i]["my_column"]

    <varlistentry>
     <term><literal>plpy.<function>prepare</function>(<replaceable>query</replaceable> [,
<replaceable>argtypes</replaceable>])</literal></term>
-    <term><literal>plpy.<function>execute</function>(<replaceable>plan</replaceable> [,
<replaceable>arguments</replaceable>[, <replaceable>max-rows</replaceable>]])</literal></term> 
+    <term><literal>plpy.<function>execute</function>(<replaceable>plan</replaceable> [,
<replaceable>arguments</replaceable>[, <replaceable>limit</replaceable>]])</literal></term> 
     <listitem>
      <para>
       <indexterm><primary>preparing a query</primary><secondary>in PL/Python</secondary></indexterm>
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index bf56ba6b1c..b31f2c1330 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -341,9 +341,11 @@ $$ LANGUAGE pltcl;
        </para>
        <para>
         The optional <literal>-count</literal> value tells
-        <function>spi_exec</function> the maximum number of rows
-        to process in the command.  The effect of this is comparable to
-        setting up a query as a cursor and then saying <literal>FETCH <replaceable>n</replaceable></literal>.
+        <function>spi_exec</function> to stop
+        once <replaceable>n</replaceable> rows have been retrieved,
+        much as if the query included a <literal>LIMIT</literal> clause.
+        If <replaceable>n</replaceable> is zero, the query is run to
+        completion, the same as when <literal>-count</literal> is omitted.
        </para>
        <para>
         If the command is a <command>SELECT</command> statement, the values of the

Re: plpython does not honour max-rows

От

Tom Lane

Дата:

02 мая 2023 г., 23:39:15

Daniel Gustafsson <daniel@yesql.se> writes:
>> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder whether the similar plperl and pltcl wrappers are also
>> documentation-shy here.

> It seems like they are all a bit thin on explaining this.  The attached diff
> copies the wording (which unsurprisingly is pretty good IMO) into the
> plperl/python/tcl documentation.

Ah, seems like we set to work on this at the same time :-(

I thought that s/max-rows/limit/ would be a good idea, mainly because
plperl's spi_exec_prepared uses that name as a caller-exposed hash key.
I'm not especially concerned about the wording otherwise.

            regards, tom lane

Re: plpython does not honour max-rows

От

Daniel Gustafsson

Дата:

02 мая 2023 г., 23:50:17

> On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 2 May 2023, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I wonder whether the similar plperl and pltcl wrappers are also
>>> documentation-shy here.
>
>> It seems like they are all a bit thin on explaining this.  The attached diff
>> copies the wording (which unsurprisingly is pretty good IMO) into the
>> plperl/python/tcl documentation.
>
> Ah, seems like we set to work on this at the same time :-(

Pretty impressive timing across timezones =)

> I thought that s/max-rows/limit/ would be a good idea,

I was actually thinking about that but backed off to not confuse things with
LIMIT.

> mainly because
> plperl's spi_exec_prepared uses that name as a caller-exposed hash key.

But I didn't realize that, and in light of that I agree that limit is better.

> I'm not especially concerned about the wording otherwise.

Neither am I, both are fine I think.

--
Daniel Gustafsson

Re: plpython does not honour max-rows

От

Tom Lane

Дата:

03 мая 2023 г., 00:21:51

Daniel Gustafsson <daniel@yesql.se> writes:
> On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> mainly because
>> plperl's spi_exec_prepared uses that name as a caller-exposed hash key.

> But I didn't realize that, and in light of that I agree that limit is better.

OK, let's do it like that then.

>> I'm not especially concerned about the wording otherwise.

> Neither am I, both are fine I think.

I'll compare and merge the two patches and push.  Thanks for
looking at it!

            regards, tom lane

Re: plpython does not honour max-rows

От

Daniel Gustafsson

Дата:

03 мая 2023 г., 00:23:39

> On 2 May 2023, at 23:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> On 2 May 2023, at 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> mainly because
>>> plperl's spi_exec_prepared uses that name as a caller-exposed hash key.
>
>> But I didn't realize that, and in light of that I agree that limit is better.
>
> OK, let's do it like that then.
>
>>> I'm not especially concerned about the wording otherwise.
>
>> Neither am I, both are fine I think.
>
> I'll compare and merge the two patches and push.  Thanks for
> looking at it!

+1, sounds like a good plan.

--
Daniel Gustafsson

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: plpython does not honour max-rows

Вложения