RE: Partial aggregates pushdown

Поиск

Список

Период

Сортировка

От	Fujii.Yuki@df.MitsubishiElectric.co.jp"
Тема	RE: Partial aggregates pushdown
Дата	22 ноября 2023 г. 13:16:16
Msg-id	TYAPR01MB55149B8569F2FDA262654C4F95BAA@TYAPR01MB5514.jpnprd01.prod.outlook.com обсуждение исходный текст
Ответ на	Re: Partial aggregates pushdown (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Ответы	Re: Partial aggregates pushdown (Bruce Momjian <bruce@momjian.us>) Re: Partial aggregates pushdown (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas <robertmhaas@gmail.com>
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a whole lot of additional entries to
pg_aggregate,and
 
> every new aggregate we add in the future will require a bonus entry for this, and it needs a bunch of new pg_proc
entries
> as well. One idea that I've had in the past is to instead introduce syntax that just does this, without requiring a
separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the pg_aggregate catalog and pg_proc catalog from
pg_aggregate.datand pg_proc.dat
 
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

В списке pgsql-hackers по дате отправления:

Предыдущее

От: John Naylor
Дата: 22 ноября 2023 г., 13:07:47
Сообщение: Re: Output affected rows in EXPLAIN

Следующее

От: Xiang Gao
Дата: 22 ноября 2023 г., 13:16:44
Сообщение: RE: CRC32C Parallel Computation Optimization on ARM

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

RE: Partial aggregates pushdown

Предыдущее

Следующее