Quick Links

Optimizing numeric SUM() aggregate

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Optimizing numeric SUM() aggregate
Date:	2016-07-25 10:45:58
Message-ID:	c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I spent some time profiling a simply query with a SUM() aggregate. We've
made some big improvements in this area in recent years, but it seems
there's still some room for improvement. A lot of CPU time is spent in
the numeric add_var() and friends. Which isn't that surprising, I guess.

I came up with the attached patch that keeps the sum in a specialized
accumulator, instead of a NumericVar. The specialized accumulator has a
few tricks, compared to the status quo:

1. Uses 32-bit integers to represent each base-10000 "digit". Instead of
propagating carry after each new value, it's done only every 9999 values
(or at the end).

2. Accumulates positive and negative values separately. They positive
and negative sums are added together only at the end. This avoids the
overhead in add_var(), for figuring out which value is larger and
determining the result sign at each step.

3. Only allocate a new buffer when it needs to be enlarged. add_abs()
allocates a new one on every call.

These optimizations give a nice speedup for SUM(), and other aggregates
like AVG() and STDDEV() that use the same agg state. For example, using
the same test query that Hadi Moshayedi used on previous work on numeric
aggregates
(https://www.postgresql.org/message-id/CAK%3D1%3DWrmCkWc_xQXs_bpUyswCPr7O9zkLmm8Oa7_nT2vybvBEQ%40mail.gmail.com)

CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
generate_series(1, 10000000) s;

SELECT avg(d) FROM avg_test;

On my laptop, with max_parallel_workers_per_gather=0, this runs in about
1.5 s without the patch, and 1.2 s with the patch.

- Heikki

Attachment	Content-Type	Size
0001-Speed-up-SUM-calculation-in-numeric-aggregates.patch	application/x-patch	26.1 KB

Responses

Re: Optimizing numeric SUM() aggregate at 2016-07-26 12:27:13 from Andrew Borodin
Re: Optimizing numeric SUM() aggregate at 2016-08-03 17:47:20 from Andrey Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2016-07-25 12:49:55	Re: Curing plpgsql's memory leaks for statement-lifespan values
Previous Message	Ildar Musin	2016-07-25 10:42:33	Confusing TAP tests readme file