From: | Philippe Pepiot <phil(at)philpep(dot)org> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Range partitioning query performance with date_trunc (vs timescaledb) |
Date: | 2023-08-29 07:40:06 |
Message-ID: | ZO2g1l1ho60LSEBt@bezout.in.philpep.org |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
I'm trying to implement some range partitioning on timeseries data. But it
looks some queries involving date_trunc() doesn't make use of partitioning.
BEGIN;
CREATE TABLE test (
time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
value FLOAT NOT NULL
) PARTITION BY RANGE (time);
CREATE INDEX test_time_idx ON test(time DESC);
CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;
The plan query all partitions:
HashAggregate
Group Key: (date_trunc('year'::text, test."time"))
-> Append
-> Seq Scan on test_y2010 test_1
Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)
-> Seq Scan on test_y2011 test_2
Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)
The view is there so show the use case, but we get almost similar plan with SELECT * FROM test WHERE DATE_TRUNC('year', time) >= TIMESTAMP '2021-01-01';
I tested a variation with timescaledb which seem using trigger based
partitioning:
BEGIN;
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE test (
time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
value FLOAT NOT NULL
);
SELECT create_hypertable('test', 'time', chunk_time_interval => INTERVAL '1 year');
CREATE VIEW vtest AS SELECT time_bucket('1 year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
-- insert some data as partitions are created on the fly
INSERT INTO test VALUES (TIMESTAMP '2020-01-15', 1.0), (TIMESTAMP '2021-12-15', 2.0);
\d+ test
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;
The plan query a single partition:
GroupAggregate
Group Key: (time_bucket('1 year'::interval, _hyper_1_2_chunk."time"))
-> Result
-> Index Scan Backward using _hyper_1_2_chunk_test_time_idx on _hyper_1_2_chunk
Index Cond: ("time" >= '2021-01-01 00:00:00'::timestamp without time zone)
Filter: (time_bucket('1 year'::interval, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)
Note single partition query only works with time_bucket(), not with date_trunc(), I guess
there is some magic regarding this in time_bucket() implementation.
I wonder if there is a way with a reasonable amount of SQL code to achieve this
with vanilla postgres ?
Maybe by taking assumption that DATE_TRUNC(..., time) <= time ?
Thanks!
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2023-08-29 09:38:05 | Re: Range partitioning query performance with date_trunc (vs timescaledb) |
Previous Message | Peter Geoghegan | 2023-08-29 01:49:13 | Re: Index bloat and REINDEX/VACUUM optimization for partial index |