Re: Combining Aggregates

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Combining Aggregates
Дата
Msg-id CAKJS1f9u0eC-U0cE_ottqYFuNx2Ucu_73KXPFVr-48zZ3Gkzew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combining Aggregates  (David Steele <david@pgmasters.net>)
Ответы Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
On 12 March 2016 at 06:28, David Steele <david@pgmasters.net> wrote:
> As far as I can tell this is the most recent version of the patch but it
> doesn't apply on master.  I'm guessing that's mostly because of Tom's
> UPP patch.
>
> This is a long and confusing thread and I should know since I just read
> through the entire thing.  I'm setting this back to "waiting for author"
> and I'd like see a rebased patch and a summary of what's in the patch
> before setting it back to "needs review".

Many thank for taking the time to read through this thread.

Let me try to summarise the current status.

Overview:
Combine Aggregate states is required to allow aggregate functions to
be calculated in multiple stages. The primary intended use for this,
as of now is for allowing parallel aggregate. A simple combine
function for count(*) would perform something very similar to sum() of
all of the partially calculated aggregate states. In this case
"partially" means aggregated states without the final function called.

Status:
As of [1] we have basic infrastructure in place to allow this, but
that commit only added the infrastructure for aggregate functions
which don't have an INTERNAL state type. Many aggregates do have an
internal state type, and for the case of parallel aggregate, since
these internal states are just passed around as pointers to memory
which belongs to the process wherever the member was allocated, we're
unable to pass these back to the main process for the final aggregate
stage. This patch allows this to happen by way of serializing these
internal states into some type that can be copied into the main
backend process.
Robert was quite right to leave this part out of the commit as at the
time the method I was using was under debate, and there was no reason
to hold things up, as the more simple case could be committed to allow
work to continue on features which would use this new infrastructure.

These threads are a bit of a mess as there's often confusion (from me
too) about which code belongs in which patch. Quite possibly the
setrefs.c stuff of the parallel aggregate patch actually do belong
here, but it's probably not worth shuffling things around as I think
the parallel agg stuff should go in before the additional serial
states stuff.

Current patch:
I've now updated the patch to base it on top of the parallel aggregate
patch in [2]. To apply the attached, you must apply [2] first!
The changes to numeric.c are not really intended for commit for now. I
plan on spending a bit of time first on looking at using send
functions and bytea serial types, which if that works should be far
more efficient than converting the serialized output into a string.
The example combine function in numeric.c is just intended to show
that this does actually work.

I've also left one opr_sanity test failing for now. This test is
complaining that the de-serialisation function returns an INTERNAL
type and does not accept one (which would disallow it from being
called from SQL interface). I think this is safe to allow as I've
coded the function to fail if it's not called in aggregate context.
Perhaps I'm wrong though, so I left the test failing to highlight that
I'd like someone else to look and maybe think about this a bit. It
seems very important not to get this part wrong.

Comments and reviews etc all welcome. Please remember to apply [2]
before the attached.

[1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a7de3dc5c346e07e0439275982569996e645b3c2
[2] http://www.postgresql.org/message-id/CAKJS1f9Tr-+9aKWZ1XuHUnEJZ7GKKfo45b+4fCNj8DkrDZYK4g@mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: pg_stat_get_progress_info(NULL) blows up
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Relation extension scalability