Parallel aggregates in PG 16.1

From: ZIMANYI Esteban <esteban(dot)zimanyi(at)ulb(dot)be>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Parallel aggregates in PG 16.1
Date: 2023-11-10 10:47:42
Message-ID: DB9P190MB1419B520B63F120567BD61918DAEA@DB9P190MB1419.EURP190.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In MobilityDB
https://github.com/MobilityDB/MobilityDB
we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.

We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows

CREATE FUNCTION tstzspan_union_finalfn(internal)
RETURNS tstzspanset
AS 'MODULE_PATHNAME', 'Span_union_finalfn'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

CREATE AGGREGATE span_union(tstzspan) (
SFUNC = array_agg_transfn,
STYPE = internal,
COMBINEFUNC = array_agg_combine,
SERIALFUNC = array_agg_serialize,
DESERIALFUNC = array_agg_deserialize,
FINALFUNC = tstzspan_union_finalfn
);

As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans.

I am testing the parallel aggregate features of PG 16.1

test=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

I create a table with 1M random spans and another one with the same data converted to tstzrange

CREATE TABLE tbl_tstzspan_1M AS
SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t
FROM generate_series(1, 1e6) AS k;

CREATE TABLE tbl_tstzrange_1M AS
SELECT k, t::tstzrange
FROM tbl_tstzspan_1M;

test=# analyze;
ANALYZE
test=#

The tstzrange DOES NOT support parallel aggregates

test=# EXPLAIN
SELECT k%10, range_agg(t) AS t
FROM tbl_tstzrange_1M
group by k%10
order by k%10;

QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate (cost=66706.17..203172.65 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..183172.65 rows=1000000 width=54)
Workers Planned: 2
-> Sort (cost=65706.15..66747.81 rows=416667 width=54)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzrange_1m (cost=0.00..12568.33 rows=416667 width=54)
(7 rows)

The array_agg function supports parallel aggregates

test=# EXPLAIN
SELECT k%10, array_agg(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=66706.17..193518.60 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..172268.60 rows=833334 width=64)
Workers Planned: 2
-> Partial GroupAggregate (cost=65706.15..75081.15 rows=416667 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=65706.15..66747.81 rows=416667 width=56)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzspan_1m (cost=0.00..12568.33 rows=416667 width=56)
(9 rows)

We are not able to make span_union aggregate support parallel aggregates

test=# EXPLAIN
SELECT k%10, span_union(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
--------------------------------------------------------------------------------------
GroupAggregate (cost=187879.84..210379.84 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=187879.84..190379.84 rows=1000000 width=56)
Sort Key: ((k % '10'::numeric))
-> Seq Scan on tbl_tstzspan_1m (cost=0.00..19860.00 rows=1000000 width=56)

Any suggestion?

Thanks

Esteban

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-11-10 11:00:36 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2023-11-10 09:57:25 Re: A recent message added to pg_upgade