From: | ZIMANYI Esteban <esteban(dot)zimanyi(at)ulb(dot)be> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Parallel aggregates in PG 16.1 |
Date: | 2023-11-10 10:47:42 |
Message-ID: | DB9P190MB1419B520B63F120567BD61918DAEA@DB9P190MB1419.EURP190.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
In MobilityDB
https://github.com/MobilityDB/MobilityDB
we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.
We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows
CREATE FUNCTION tstzspan_union_finalfn(internal)
RETURNS tstzspanset
AS 'MODULE_PATHNAME', 'Span_union_finalfn'
LANGUAGE C IMMUTABLE PARALLEL SAFE;
CREATE AGGREGATE span_union(tstzspan) (
SFUNC = array_agg_transfn,
STYPE = internal,
COMBINEFUNC = array_agg_combine,
SERIALFUNC = array_agg_serialize,
DESERIALFUNC = array_agg_deserialize,
FINALFUNC = tstzspan_union_finalfn
);
As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans.
I am testing the parallel aggregate features of PG 16.1
test=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
I create a table with 1M random spans and another one with the same data converted to tstzrange
CREATE TABLE tbl_tstzspan_1M AS
SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t
FROM generate_series(1, 1e6) AS k;
CREATE TABLE tbl_tstzrange_1M AS
SELECT k, t::tstzrange
FROM tbl_tstzspan_1M;
test=# analyze;
ANALYZE
test=#
The tstzrange DOES NOT support parallel aggregates
test=# EXPLAIN
SELECT k%10, range_agg(t) AS t
FROM tbl_tstzrange_1M
group by k%10
order by k%10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate (cost=66706.17..203172.65 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..183172.65 rows=1000000 width=54)
Workers Planned: 2
-> Sort (cost=65706.15..66747.81 rows=416667 width=54)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzrange_1m (cost=0.00..12568.33 rows=416667 width=54)
(7 rows)
The array_agg function supports parallel aggregates
test=# EXPLAIN
SELECT k%10, array_agg(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=66706.17..193518.60 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..172268.60 rows=833334 width=64)
Workers Planned: 2
-> Partial GroupAggregate (cost=65706.15..75081.15 rows=416667 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=65706.15..66747.81 rows=416667 width=56)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzspan_1m (cost=0.00..12568.33 rows=416667 width=56)
(9 rows)
We are not able to make span_union aggregate support parallel aggregates
test=# EXPLAIN
SELECT k%10, span_union(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
--------------------------------------------------------------------------------------
GroupAggregate (cost=187879.84..210379.84 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=187879.84..190379.84 rows=1000000 width=56)
Sort Key: ((k % '10'::numeric))
-> Seq Scan on tbl_tstzspan_1m (cost=0.00..19860.00 rows=1000000 width=56)
Any suggestion?
Thanks
Esteban
From | Date | Subject | |
---|---|---|---|
Next Message | Drouvot, Bertrand | 2023-11-10 11:00:36 | Re: Synchronizing slots from primary to standby |
Previous Message | Amit Kapila | 2023-11-10 09:57:25 | Re: A recent message added to pg_upgade |