Обсуждение: BUG #18430: syntax error when using aggregate function in where clause of subquery

Поиск
Список
Период
Сортировка

BUG #18430: syntax error when using aggregate function in where clause of subquery

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18430
Logged by:          Eric Atkin
Email address:      eatkin@certusllc.us
PostgreSQL version: 16.2
Operating system:   Arch Linux
Description:

CREATE TABLE driver (
    id INTEGER PRIMARY KEY,
    city TEXT
);

CREATE TABLE delivery (
    id INTEGER PRIMARY KEY,
    driver_id INTEGER REFERENCES driver(id)
);

SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
AS deliveries
FROM driver
GROUP BY city
;

This produces:

ERROR:  syntax error at or near "array_agg"
LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...

I assume this is because aggregate functions are not normally allowed in a
where clause. However, my reading of the last paragraph of Section 4.2.7 of
the version 16 docs (quoted below) leads me to believe there should be an
exception to that constraint when a subquery occurs in an outer query select
list and refers to one of its variables. Is it possible there is a bug in
the parser causing it to not be aware of this exception?

Perhaps a join of delivery would be a simpler way to write this query in
this trivial example, but I think the subquery approach is more reasonable
in the real world case I've derived it from. In any case, I think postgres
should behave as documented even if I can only poorly demonstrate the
issue.

Thank you,
Eric Atkin

4.2.7 Aggregate Expressions

...

When an aggregate expression appears in a subquery (see Section 4.2.11 and
Section 9.23), the aggregate is normally evaluated over the rows of the
subquery. But an exception occurs if the aggregate's arguments (and
filter_clause if any) contain only outer-level variables: the aggregate then
belongs to the nearest such outer level, and is evaluated over the rows of
that query. The aggregate expression as a whole is then an outer reference
for the subquery it appears in, and acts as a constant over any one
evaluation of that subquery. The restriction about appearing only in the
result list or HAVING clause applies with respect to the query level that
the aggregate belongs to.


On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote:
> SELECT
>     city,
>     (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
> AS deliveries
> FROM driver
> GROUP BY city
> ;
>
> This produces:
>
> ERROR:  syntax error at or near "array_agg"
> LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...

This not a bug, but bad syntax.

Write

  SELECT city,
         (SELECT count(*)
          FROM delivery
          WHERE driver_id = ANY (drivers)) AS deliveries
  FROM (SELECT city,
               array_agg(driver.id) AS drivers
        FROM driver
        GROUP BY city) AS q;

Yours,
Laurenz Albe



"The aggregate expression as a whole is then an outer reference
for the subquery it appears in, and acts as a constant over any one
evaluation of that subquery."

It seems the aggregate function call should be evaluated at the outer layer (where it would not be bad syntax) and then is a constant for the inner query where clause where an array would be allowed.

Thank you for your working example. There is often more than one way to write any particular query. My version and yours seem to mean the same thing, but one is arguably more readable. At the very least, is this not a documentation bug?

On Fri, Apr 12, 2024 at 11:45 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote:
> SELECT
>     city,
>     (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
> AS deliveries
> FROM driver
> GROUP BY city
> ;
>
> This produces:
>
> ERROR:  syntax error at or near "array_agg"
> LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...

This not a bug, but bad syntax.

Write

  SELECT city,
         (SELECT count(*)
          FROM delivery
          WHERE driver_id = ANY (drivers)) AS deliveries
  FROM (SELECT city,
               array_agg(driver.id) AS drivers
        FROM driver
        GROUP BY city) AS q;

Yours,
Laurenz Albe

Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

От
"David G. Johnston"
Дата:
On Fri, Apr 12, 2024 at 10:32 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18430
Logged by:          Eric Atkin
Email address:      eatkin@certusllc.us
PostgreSQL version: 16.2
Operating system:   Arch Linux
Description:       

    (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))


This doesn't seem to have anything to do with scoping of inner and outer queries.  Where did you come to believe that:

scalar IN array

is a valid operator/expression.

i.e., your query fails for the same reason this one does:

select 1 in array[1,2,3];

David J.

Eric Atkin <eatkin@certusllc.us> writes:
> It seems the aggregate function call should be evaluated at the outer layer
> (where it would not be bad syntax) and then is a constant for the inner
> query where clause where an array would be allowed.

Correct, but the problem is not with the array_agg call, it's with
your use of IN.  In the first place, IN requires parens around its
righthand side.  But that only gets us past "syntax error":

regression=# SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
ERROR:  operator does not exist: integer = integer[]
LINE 3: ...   (SELECT count(*) FROM delivery WHERE driver_id IN (array_...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

That's because what's inside the parens is supposed to be either a
sub-select or a list of things directly comparable to the LHS.
To do what you're after, you need to use the "scalar = ANY(array)"
construct:

regression=# SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
 city | deliveries
------+------------
(0 rows)

IN and =ANY are more-or-less equivalent when the RHS is a sub-select,
but not for an array RHS.

There's nothing particularly wrong with Laurenz's construction,
but it's not necessary to split it up like that.

            regards, tom lane



Perfect. I had tried the `= ANY` construct but got the same syntax error and mistakenly assumed it was a problem with the parser rather than my misreading of the `ANY (array)` docs (9.24.3).
Thank you both for your help.

On Fri, Apr 12, 2024 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Atkin <eatkin@certusllc.us> writes:
> It seems the aggregate function call should be evaluated at the outer layer
> (where it would not be bad syntax) and then is a constant for the inner
> query where clause where an array would be allowed.

Correct, but the problem is not with the array_agg call, it's with
your use of IN.  In the first place, IN requires parens around its
righthand side.  But that only gets us past "syntax error":

regression=# SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
ERROR:  operator does not exist: integer = integer[]
LINE 3: ...   (SELECT count(*) FROM delivery WHERE driver_id IN (array_...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

That's because what's inside the parens is supposed to be either a
sub-select or a list of things directly comparable to the LHS.
To do what you're after, you need to use the "scalar = ANY(array)"
construct:

regression=# SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
 city | deliveries
------+------------
(0 rows)

IN and =ANY are more-or-less equivalent when the RHS is a sub-select,
but not for an array RHS.

There's nothing particularly wrong with Laurenz's construction,
but it's not necessary to split it up like that.

                        regards, tom lane