Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN

Поиск
Список
Период
Сортировка
От sulfinu@gmail.com
Тема Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Дата
Msg-id CAGH1kmzre_Nr+mnErVLbAgGpOAmho_VQwY+uMzmVqEzaZKkPEA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN  (sulfinu@gmail.com)
Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hello,

I'm sorry if this message brings up once more an already settled issue, but there's no public list of  bug reports for PostgreSQL (the release notes contain only acknowledged, solved bugs).

This problem was noticed in version 15.1, as I don't have the latest version at hand, but I've read the release notes for 15.2 and 15.3 and there's no sign of any related change. It's about the outcome of a SELECT statement which involves computing aggregate functions like json_agg, jsonb_agg and array_agg over columns that originate from outer-joined entities, when these entities are missing.

Here's a very simple schema to illustrate the problem:
create table a (
  id serial2 primary key,
  name text
);

create table b (
  name text primary key,
  a int2 references a(id)
);

insert into a (name) values ('a1'), ('a2');
insert into b (name, a) values
  ('b for a1', 1), ('another b for a1', 1);
Notice that entity named a2 in table a has no corresponding bs. Now run this query:
select a.*, json_agg(b) from
  a
  left join b on b.a = a.id
  group by a.id;
The aggregation column for entity a2 has value [null] (if array_agg() was used, the obtained value would be {NULL}).

Since the query without aggregation
select a.*, b.* from
  a
  left join b on b.a = a.id;
produces an entry for entity a2 with NULL values for columns belonging to table b, I expect the aggregation to produce either [{"name":null,"a":null}] or the SQL NULL for it, preferrably the latter.

I suppose there's a dillema regarding what to return: the aggregation is computed over one row, but its values are actually missing. The curent choice of PostgreSQL is somewhere in-between, which is inappropriate from both perspectives - a political compromise. Moreover, in the case of array_agg(), the returned value could have also originated from a single-row single-column actual NULL value!

Returning a simple SQL NULL is the most appropriate choice, in line with the general rule that aggregations over zero rows return NULL. For the database client, it has the benefit of being both cheap to detect and distinguishable from other cases.

Although there is a workaround for this problem, it implies checking a stupid condition for absolutely every row in the JOIN result:
select a.*, json_agg(b) filter (where b.name is not null) from
  a
  left join b on b.a = a.id
  group by a.id;

Thoughts?

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Les
Дата:
Сообщение: initdb faild to initialize full text search dictionaries
Следующее
От: sulfinu@gmail.com
Дата:
Сообщение: Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN