Re: Partial aggregates pushdown

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Partial aggregates pushdown
Дата
Msg-id ZC95C0+PVhVP3iax@momjian.us
обсуждение исходный текст
Ответ на RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Ответы RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Список pgsql-hackers
On Fri, Mar 31, 2023 at 05:49:21AM +0000, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote:
> Hi Mr.Momjian
> 
> > First, am I correct?
> Yes, you are correct. This patch uses new special aggregate functions for partial aggregate
> (then we call this partialaggfunc).

First, my apologies for not addressing this sooner.  I was so focused on
my own tasks that I didn't realize this very important patch was not
getting attention.  I will try my best to get it into PG 17.

What amazes me is that you didn't need to create _any_ actual aggregate
functions.  Rather, you just needed to hook existing functions into the
aggregate tables for partial FDW execution.

> > Second, how far away is this from being committable
> > and/or what work needs to be done to get it committable, either for PG 16 or 17?
> I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary.
> I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed.
> 
> 1. Making partialaggfunc user-defined function
> In v17, I make partialaggfuncs as built-in functions.
> Because of this approach, v17 changes specification of BKI file and pg_aggregate.
> For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL.
> In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL
functions,
> It is necessary to align them with this patch's changes.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to making partialaggfunc as user defined function.

I think we have three possible cases for aggregates pushdown to FDWs:

1)  Postgres built-in aggregate functions
2)  Postgres user-defined & extension aggregate functions
3)  aggregate functions calls to non-PG FDWs

Your patch handles #1 by checking that the FDW Postgres version is the
same as the calling Postgres version.  However, it doesn't check for
extension versions, and frankly, I don't see how we could implement that
cleanly without significant complexity.

I suggest we remove the version check requirement --- instead just
document that the FDW Postgres version should be the same or newer than
the calling Postgres server --- that way, we can assume that whatever is
in the system catalogs of the caller is in the receiving side.  We
should add a GUC to turn off this optimization for cases where the FDW
Postgres version is older than the caller.  This handles case 1-2.

For case 3, I don't even know how much pushdown those do of _any_
aggregates to non-PG servers, let along parallel FDW ones.  Does anyone
know the details?

> 2. Automation of creating definition of partialaggfuncs
> In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs
> for all built-in aggregate functions.

Are there any other builtin functions that need this?  I think we can
just provide documention for extensions on how to do this.

> 3. Documentation
> I need add explanation of partialaggfunc to documents on postgres_fdw and other places.

I can help with that once we decide on the above.

I think 'partialaggfn' should be named 'aggpartialfn' to match other
columns in pg_aggregate.

I am confused by these changes to pg_aggegate:

+{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
+  aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine',
+  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
+  aggtranstype => 'internal', aggtransspace => '48' },

...

+{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
+  aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine',
+  aggserialfn => 'numeric_avg_serialize',
+  aggdeserialfn => 'numeric_avg_deserialize',
+  aggtranstype => 'internal', aggtransspace => '128' },

Why are these marked as 'sum' but use 'avg' functions?

It would be good to explain exactly how this is diffent from background
worker parallelism.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Minimal logical decoding on standbys
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Minimal logical decoding on standbys