RE: Partial aggregates pushdown

Поиск
Список
Период
Сортировка
От Fujii.Yuki@df.MitsubishiElectric.co.jp"
Тема RE: Partial aggregates pushdown
Дата
Msg-id TYAPR01MB55149B8569F2FDA262654C4F95BAA@TYAPR01MB5514.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Partial aggregates pushdown  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Ответы Re: Partial aggregates pushdown  (Bruce Momjian <bruce@momjian.us>)
Re: Partial aggregates pushdown  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas <robertmhaas@gmail.com>
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a whole lot of additional entries to
pg_aggregate,and
 
> every new aggregate we add in the future will require a bonus entry for this, and it needs a bunch of new pg_proc
entries
> as well. One idea that I've had in the past is to instead introduce syntax that just does this, without requiring a
separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the pg_aggregate catalog and pg_proc catalog from
pg_aggregate.datand pg_proc.dat
 
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: Output affected rows in EXPLAIN
Следующее
От: Xiang Gao
Дата:
Сообщение: RE: CRC32C Parallel Computation Optimization on ARM