Обсуждение: Bugreport: incorrect result in 16.1 in query with string_agg(distinct+json_build_object
Bugreport: incorrect result in 16.1 in query with string_agg(distinct+json_build_object
От
Oscar van Baten
Дата:
/*
Hi,
We've upgraded one of our instances from 14.10 to 16.1 at AWS
All fine, except for a certain output which became different.
At another instance running at 14.7 we are able to reproduce this correct answer.
Removing the `DISTINCT` or replacing `(SELECT a.id)` with `a.id` does solve it.
It looks like a reference issue.
You can reproduce with the query below.
PS: I did scan https://wiki.postgresql.org/wiki/Todo but was not able to find a similar issue.
It does look like this one:
https://www.postgresql.org/message-id/3536307.1703952795%40sss.pgh.pa.us
But not sure if it's the same issue
*/
DROP TABLE IF EXISTS pg_temp.alldata;
create temp table pg_temp.alldata AS
(
SELECT 'val_A' AS id
);
SELECT
a.id as table_value,
STRING_AGG( 'val_B', ', ') AS string_agg_value,
JSON_AGG
(
JSON_BUILD_OBJECT
(
'val_1', a.id,
'val_2', (SELECT a.id)
)
) AS zzz_2
FROM pg_temp.alldata a
GROUP BY id;
at:
PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
output:
table_value|string_agg_value|zzz_2 |
-----------+----------------+----------------------------------------+
val_A |val_B |[{"val_1" : "val_A", "val_2" : "val_A"}]|
at:
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
output:
table_value|string_agg_value|zzz_2 |
-----------+----------------+----------------------------------------+
val_A |val_B |[{"val_1" : "val_A", "val_2" : "val_B"}]| -- why is val_B suddenly here?
Hi,
We've upgraded one of our instances from 14.10 to 16.1 at AWS
All fine, except for a certain output which became different.
At another instance running at 14.7 we are able to reproduce this correct answer.
Removing the `DISTINCT` or replacing `(SELECT a.id)` with `a.id` does solve it.
It looks like a reference issue.
You can reproduce with the query below.
PS: I did scan https://wiki.postgresql.org/wiki/Todo but was not able to find a similar issue.
It does look like this one:
https://www.postgresql.org/message-id/3536307.1703952795%40sss.pgh.pa.us
But not sure if it's the same issue
*/
DROP TABLE IF EXISTS pg_temp.alldata;
create temp table pg_temp.alldata AS
(
SELECT 'val_A' AS id
);
SELECT
a.id as table_value,
STRING_AGG( 'val_B', ', ') AS string_agg_value,
JSON_AGG
(
JSON_BUILD_OBJECT
(
'val_1', a.id,
'val_2', (SELECT a.id)
)
) AS zzz_2
FROM pg_temp.alldata a
GROUP BY id;
at:
PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
output:
table_value|string_agg_value|zzz_2 |
-----------+----------------+----------------------------------------+
val_A |val_B |[{"val_1" : "val_A", "val_2" : "val_A"}]|
at:
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
output:
table_value|string_agg_value|zzz_2 |
-----------+----------------+----------------------------------------+
val_A |val_B |[{"val_1" : "val_A", "val_2" : "val_B"}]| -- why is val_B suddenly here?
Oscar van Baten <info@oxcro.com> writes: > We've upgraded one of our instances from 14.10 to 16.1 at AWS > All fine, except for a certain output which became different. > At another instance running at 14.7 we are able to reproduce this correct > answer. > Removing the `DISTINCT` or replacing `(SELECT a.id)` with `a.id` does solve > it. > It looks like a reference issue. > You can reproduce with the query below. Hi, I failed to reproduce this. I get the expected output table_value | string_agg_value | zzz_2 -------------+------------------+------------------------------------------ val_A | val_B | [{"val_1" : "val_A", "val_2" : "val_A"}] (1 row) both in HEAD and at 16.1 release (3edc6580c). It seems to me that either you have some relevant non-default planner setting, or this is an AWS-specific bug. Could we see the output of EXPLAIN (VERBOSE, SETTINGS) for this query on the troublesome system? We do have an open report of a problem with DISTINCT aggregates, https://www.postgresql.org/message-id/flat/18264-e363593d7e9feb7d%40postgresql.org but it's unclear whether this is the same. regards, tom lane
Re: Bugreport: incorrect result in 16.1 in query with string_agg(distinct+json_build_object
От
Oscar van Baten
Дата:
Hi Tom,
There you go:
v14.7
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp_684.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '8014960kB', jit = 'off', search_path = 'public, public, "$user"' |
Query Identifier: -1271634773772018342 |
v16.1
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '16179496kB', jit = 'off', temp_buffers = '64MB', search_path = 'public, public, "$user"' |
Query Identifier: 1935934247983289476 |
Will try to compare the planner settings too.
Regards,
Oscar
There you go:
v14.7
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp_684.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '8014960kB', jit = 'off', search_path = 'public, public, "$user"' |
Query Identifier: -1271634773772018342 |
v16.1
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '16179496kB', jit = 'off', temp_buffers = '64MB', search_path = 'public, public, "$user"' |
Query Identifier: 1935934247983289476 |
Will try to compare the planner settings too.
Regards,
Oscar
Op wo 3 jan 2024 om 17:56 schreef Tom Lane <tgl@sss.pgh.pa.us>:
Oscar van Baten <info@oxcro.com> writes:
> We've upgraded one of our instances from 14.10 to 16.1 at AWS
> All fine, except for a certain output which became different.
> At another instance running at 14.7 we are able to reproduce this correct
> answer.
> Removing the `DISTINCT` or replacing `(SELECT a.id)` with `a.id` does solve
> it.
> It looks like a reference issue.
> You can reproduce with the query below.
Hi, I failed to reproduce this. I get the expected output
table_value | string_agg_value | zzz_2
-------------+------------------+------------------------------------------
val_A | val_B | [{"val_1" : "val_A", "val_2" : "val_A"}]
(1 row)
both in HEAD and at 16.1 release (3edc6580c). It seems to me that
either you have some relevant non-default planner setting, or this
is an AWS-specific bug. Could we see the output of
EXPLAIN (VERBOSE, SETTINGS)
for this query on the troublesome system?
We do have an open report of a problem with DISTINCT aggregates,
https://www.postgresql.org/message-id/flat/18264-e363593d7e9feb7d%40postgresql.org
but it's unclear whether this is the same.
regards, tom lane
Re: Bugreport: incorrect result in 16.1 in query with string_agg(distinct+json_build_object
От
Oscar van Baten
Дата:
I've found that v16.1 has 2 new settings:
enable_presorted_aggregate
recursive_worktable_factor
When I SET enable_presorted_aggregate = OFF: (non default)
The result changes to the correct one.
QUERY PLAN
GroupAggregate (cost=94.38..128.98 rows=200 width=96)
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text
Group Key: a.id
-> Sort (cost=94.38..97.78 rows=1360 width=32)
Output: a.id
Sort Key: a.id
-> Seq Scan on pg_temp.alldata a (cost=0.00..23.60 rows=1360 width=32)
Output: a.id
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32)
Output: a.id
Settings: effective_cache_size = '16179496kB', enable_presorted_aggregate = 'off', jit = 'off', temp_buffers = '64MB', search_path = 'public, public, "$user"'
Query Identifier: 1935934247983289476
And SET jit = ON; kept the wrong result.
Regards,
Oscar
enable_presorted_aggregate
recursive_worktable_factor
When I SET enable_presorted_aggregate = OFF: (non default)
The result changes to the correct one.
QUERY PLAN
GroupAggregate (cost=94.38..128.98 rows=200 width=96)
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text
Group Key: a.id
-> Sort (cost=94.38..97.78 rows=1360 width=32)
Output: a.id
Sort Key: a.id
-> Seq Scan on pg_temp.alldata a (cost=0.00..23.60 rows=1360 width=32)
Output: a.id
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32)
Output: a.id
Settings: effective_cache_size = '16179496kB', enable_presorted_aggregate = 'off', jit = 'off', temp_buffers = '64MB', search_path = 'public, public, "$user"'
Query Identifier: 1935934247983289476
And SET jit = ON; kept the wrong result.
Oscar
Op wo 3 jan 2024 om 19:01 schreef Oscar van Baten <info@oxcro.com>:
Hi Tom,
There you go:
v14.7
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp_684.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '8014960kB', jit = 'off', search_path = 'public, public, "$user"' |
Query Identifier: -1271634773772018342 |
v16.1
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
GroupAggregate (cost=94.38..128.98 rows=200 width=96) |
Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text|
Group Key: a.id |
-> Sort (cost=94.38..97.78 rows=1360 width=32) |
Output: a.id |
Sort Key: a.id |
-> Seq Scan on pg_temp.alldata a (cost=0.00..23.60 rows=1360 width=32) |
Output: a.id |
SubPlan 1 |
-> Result (cost=0.00..0.01 rows=1 width=32) |
Output: a.id |
Settings: effective_cache_size = '16179496kB', jit = 'off', temp_buffers = '64MB', search_path = 'public, public, "$user"' |
Query Identifier: 1935934247983289476 |
Will try to compare the planner settings too.
Regards,
OscarOp wo 3 jan 2024 om 17:56 schreef Tom Lane <tgl@sss.pgh.pa.us>:Oscar van Baten <info@oxcro.com> writes:
> We've upgraded one of our instances from 14.10 to 16.1 at AWS
> All fine, except for a certain output which became different.
> At another instance running at 14.7 we are able to reproduce this correct
> answer.
> Removing the `DISTINCT` or replacing `(SELECT a.id)` with `a.id` does solve
> it.
> It looks like a reference issue.
> You can reproduce with the query below.
Hi, I failed to reproduce this. I get the expected output
table_value | string_agg_value | zzz_2
-------------+------------------+------------------------------------------
val_A | val_B | [{"val_1" : "val_A", "val_2" : "val_A"}]
(1 row)
both in HEAD and at 16.1 release (3edc6580c). It seems to me that
either you have some relevant non-default planner setting, or this
is an AWS-specific bug. Could we see the output of
EXPLAIN (VERBOSE, SETTINGS)
for this query on the troublesome system?
We do have an open report of a problem with DISTINCT aggregates,
https://www.postgresql.org/message-id/flat/18264-e363593d7e9feb7d%40postgresql.org
but it's unclear whether this is the same.
regards, tom lane
Oscar van Baten <info@oxcro.com> writes: > There you go: > GroupAggregate (cost=94.38..128.98 rows=200 width=96) > Output: a.id, string_agg(DISTINCT 'val_B'::text, ', '::text), > (json_agg(json_build_object('val_1', a.id, 'val_2', (SubPlan 1))))::text| Ah, that gave me the clue I needed (maybe I should have deduced it from your Subject, but I plead ENOCAFFEINE). Your query as submitted doesn't use DISTINCT in the STRING_AGG call, but that's needed to trigger the bug. And it is indeed the same bug as #18264: it begins at the same commit, and David's WIP patch fixes it. regards, tom lane