Обсуждение: Range partitioning query performance with date_trunc (vs timescaledb)

Поиск
Список
Период
Сортировка

Range partitioning query performance with date_trunc (vs timescaledb)

От
Philippe Pepiot
Дата:
Hi,

I'm trying to implement some range partitioning on timeseries data. But it
looks some queries involving date_trunc() doesn't make use of partitioning.

BEGIN;
CREATE TABLE test (
    time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    value FLOAT NOT NULL
) PARTITION BY RANGE (time);
CREATE INDEX test_time_idx ON test(time DESC);
CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;

The plan query all partitions:

HashAggregate
  Group Key: (date_trunc('year'::text, test."time"))
  ->  Append
        ->  Seq Scan on test_y2010 test_1
              Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)
        ->  Seq Scan on test_y2011 test_2
              Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)


The view is there so show the use case, but we get almost similar plan with SELECT * FROM test WHERE DATE_TRUNC('year',
time)>= TIMESTAMP '2021-01-01';
 


I tested a variation with timescaledb which seem using trigger based
partitioning:

BEGIN;
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE test (
    time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    value FLOAT NOT NULL
);
SELECT create_hypertable('test', 'time', chunk_time_interval => INTERVAL '1 year');
CREATE VIEW vtest AS SELECT time_bucket('1 year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
-- insert some data as partitions are created on the fly
INSERT INTO test VALUES (TIMESTAMP '2020-01-15', 1.0), (TIMESTAMP '2021-12-15', 2.0);
\d+ test
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;


The plan query a single partition:

GroupAggregate
  Group Key: (time_bucket('1 year'::interval, _hyper_1_2_chunk."time"))
  ->  Result
        ->  Index Scan Backward using _hyper_1_2_chunk_test_time_idx on _hyper_1_2_chunk
              Index Cond: ("time" >= '2021-01-01 00:00:00'::timestamp without time zone)
              Filter: (time_bucket('1 year'::interval, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)


Note single partition query only works with time_bucket(), not with date_trunc(), I guess
there is some magic regarding this in time_bucket() implementation.


I wonder if there is a way with a reasonable amount of SQL code to achieve this
with vanilla postgres ?

Maybe by taking assumption that DATE_TRUNC(..., time) <= time ?

Thanks!



Re: Range partitioning query performance with date_trunc (vs timescaledb)

От
David Rowley
Дата:
On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot <phil@philpep.org> wrote:
> I'm trying to implement some range partitioning on timeseries data. But it
> looks some queries involving date_trunc() doesn't make use of partitioning.
>
> BEGIN;
> CREATE TABLE test (
>     time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
>     value FLOAT NOT NULL
> ) PARTITION BY RANGE (time);
> CREATE INDEX test_time_idx ON test(time DESC);
> CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
> CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
> CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
> EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
> ROLLBACK;
>
> The plan query all partitions:

> I wonder if there is a way with a reasonable amount of SQL code to achieve this
> with vanilla postgres ?

The only options I see for you are

1) partition by LIST(date_Trunc('year', time)), or;
2) use a set-returning function instead of a view and pass the date
range you want to select from the underlying table via parameters.

I imagine you won't want to do #1. However, it would at least also
allow the aggregation to be performed before the Append if you SET
enable_partitionwise_aggregate=1.

#2 isn't as flexible as a view as you'd have to create another
function or expand the parameters of the existing one if you want to
add items to the WHERE clause.

Unfortunately, date_trunc is just a black box to partition pruning, so
it's not able to determine that DATE_TRUNC('year', time) >=
'2021-01-01'  is the same as time >= '2021-01-01'.  It would be
possible to make PostgreSQL do that, but that's a core code change,
not something that you can do from SQL.

David



Re: Range partitioning query performance with date_trunc (vs timescaledb)

От
Philippe Pepiot
Дата:
On 29/08/2023, David Rowley wrote:
> On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot <phil@philpep.org> wrote:
> > I'm trying to implement some range partitioning on timeseries data. But it
> > looks some queries involving date_trunc() doesn't make use of partitioning.
> >
> > BEGIN;
> > CREATE TABLE test (
> >     time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
> >     value FLOAT NOT NULL
> > ) PARTITION BY RANGE (time);
> > CREATE INDEX test_time_idx ON test(time DESC);
> > CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
> > CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
> > CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
> > EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
> > ROLLBACK;
> >
> > The plan query all partitions:
> 
> > I wonder if there is a way with a reasonable amount of SQL code to achieve this
> > with vanilla postgres ?
> 
> The only options I see for you are
> 
> 1) partition by LIST(date_Trunc('year', time)), or;
> 2) use a set-returning function instead of a view and pass the date
> range you want to select from the underlying table via parameters.
> 
> I imagine you won't want to do #1. However, it would at least also
> allow the aggregation to be performed before the Append if you SET
> enable_partitionwise_aggregate=1.
> 
> #2 isn't as flexible as a view as you'd have to create another
> function or expand the parameters of the existing one if you want to
> add items to the WHERE clause.
> 
> Unfortunately, date_trunc is just a black box to partition pruning, so
> it's not able to determine that DATE_TRUNC('year', time) >=
> '2021-01-01'  is the same as time >= '2021-01-01'.  It would be
> possible to make PostgreSQL do that, but that's a core code change,
> not something that you can do from SQL.

Ok I think I'll go for Set-returning function since
LIST or RANGE on (date_trunc('year', time)) will break advantage of
partitioning when querying with "time betwen x and y".

Thanks!