Обсуждение: Parallel aggregates in PG 16.1

Поиск
Список
Период
Сортировка

Parallel aggregates in PG 16.1

От
ZIMANYI Esteban
Дата:
In MobilityDB
https://github.com/MobilityDB/MobilityDB
we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.

We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows

CREATE FUNCTION tstzspan_union_finalfn(internal)
  RETURNS tstzspanset
  AS 'MODULE_PATHNAME', 'Span_union_finalfn'
  LANGUAGE C IMMUTABLE PARALLEL SAFE;

CREATE AGGREGATE span_union(tstzspan) (
  SFUNC = array_agg_transfn,
  STYPE = internal,
  COMBINEFUNC = array_agg_combine,
  SERIALFUNC = array_agg_serialize,
  DESERIALFUNC = array_agg_deserialize,
  FINALFUNC = tstzspan_union_finalfn
);

As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans.

I am testing the parallel aggregate features of PG 16.1

test=# select version();
                                                version
-------------------------------------------------------------------------------------------------------
 PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

I create a table with 1M random spans and another one with the same data converted to tstzrange

CREATE TABLE tbl_tstzspan_1M AS
SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t
FROM generate_series(1, 1e6) AS k;

CREATE TABLE tbl_tstzrange_1M AS
SELECT k, t::tstzrange
FROM tbl_tstzspan_1M;

test=# analyze;
ANALYZE
test=#

The tstzrange DOES NOT support parallel aggregates

test=# EXPLAIN
SELECT k%10, range_agg(t) AS t
FROM tbl_tstzrange_1M
group by k%10
order by k%10;

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=66706.17..203172.65 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Gather Merge  (cost=66706.17..183172.65 rows=1000000 width=54)
         Workers Planned: 2
         ->  Sort  (cost=65706.15..66747.81 rows=416667 width=54)
               Sort Key: ((k % '10'::numeric))
               ->  Parallel Seq Scan on tbl_tstzrange_1m  (cost=0.00..12568.33 rows=416667 width=54)
(7 rows)

The array_agg function supports parallel aggregates

test=# EXPLAIN
SELECT k%10, array_agg(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=66706.17..193518.60 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Gather Merge  (cost=66706.17..172268.60 rows=833334 width=64)
         Workers Planned: 2
         ->  Partial GroupAggregate  (cost=65706.15..75081.15 rows=416667 width=64)
               Group Key: ((k % '10'::numeric))
               ->  Sort  (cost=65706.15..66747.81 rows=416667 width=56)
                     Sort Key: ((k % '10'::numeric))
                     ->  Parallel Seq Scan on tbl_tstzspan_1m  (cost=0.00..12568.33 rows=416667 width=56)
(9 rows)

We are not able to make span_union aggregate support parallel aggregates

test=# EXPLAIN
SELECT k%10, span_union(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 GroupAggregate  (cost=187879.84..210379.84 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Sort  (cost=187879.84..190379.84 rows=1000000 width=56)
         Sort Key: ((k % '10'::numeric))
         ->  Seq Scan on tbl_tstzspan_1m  (cost=0.00..19860.00 rows=1000000 width=56)

Any suggestion?

Thanks

Esteban

 

Re: Parallel aggregates in PG 16.1

От
Matthias van de Meent
Дата:
On Fri, 10 Nov 2023 at 11:47, ZIMANYI Esteban <esteban.zimanyi@ulb.be> wrote:
>
> In MobilityDB
> https://github.com/MobilityDB/MobilityDB
> we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.
>
> We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as
follows
>
> CREATE FUNCTION tstzspan_union_finalfn(internal)
>   RETURNS tstzspanset
>   AS 'MODULE_PATHNAME', 'Span_union_finalfn'
>   LANGUAGE C IMMUTABLE PARALLEL SAFE;
>
> CREATE AGGREGATE span_union(tstzspan) (
>   SFUNC = array_agg_transfn,
>   STYPE = internal,
>   COMBINEFUNC = array_agg_combine,
>   SERIALFUNC = array_agg_serialize,
>   DESERIALFUNC = array_agg_deserialize,
>   FINALFUNC = tstzspan_union_finalfn
> );
>
> As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does
similarwork as the range_agg_finalfn to merge the overlapping spans.
 

Did you note the following section in the CREATE AGGREGATE documentation [0]?

"""
An aggregate can optionally support partial aggregation, as described
in Section 38.12.4.
This requires specifying the COMBINEFUNC parameter. If the
state_data_type is internal, it's usually also appropriate to provide
the SERIALFUNC and DESERIALFUNC parameters so that parallel
aggregation is possible.
Note that the aggregate must also be marked PARALLEL SAFE to enable
parallel aggregation.
"""

From this, it seems like the PARALLEL = SAFE argument is missing from
your aggregate definition as provided above.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://www.postgresql.org/docs/16/sql-createaggregate.html