| From: | Andreas Karlsson <andreas(at)proxel(dot)se> | 
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates | 
| Date: | 2014-10-25 14:38:41 | 
| Message-ID: | 544BB5F1.50709@proxel.se | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
There was recently talk about if we should start using 128-bit integers 
(where available) to speed up the aggregate functions over integers 
which uses numeric for their internal state. So I hacked together a 
patch for this to see what the performance gain would be.
Previous thread: 
http://www.postgresql.org/message-id/20141017182500.GF2075@alap3.anarazel.de
What the patch does is switching from using numerics in the aggregate 
state to int128 and then convert the type from the 128-bit integer in 
the final function.
The functions where we can make use of int128 states are:
- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)
The initial benchmark results look very promising. When summing 10 
million int8 I get a speedup of ~2.5x and similarly for var_samp() on 10 
million int4 I see a speed up of ~3.7x. To me this indicates that it is 
worth the extra code. What do you say? Is this worth implementing?
The current patch still requires work. I have not written the detection 
of int128 support yet, and the patch needs code cleanup (for example: I 
used an int16_ prefix on the added functions, suggestions for better 
names are welcome). I also need to decide on what estimate to use for 
the size of that state.
The patch should work and pass make check on platforms where __int128_t 
is supported.
The simple benchmarks:
CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1, 
10000000) x;
Before:
# SELECT sum(x) FROM test_int8;
       sum
----------------
  50000005000000
(1 row)
Time: 2521.217 ms
After:
# SELECT sum(x) FROM test_int8;
       sum
----------------
  50000005000000
(1 row)
Time: 1022.811 ms
CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1, 
10000000) x;
Before:
# SELECT var_samp(x) FROM test_int4;
       var_samp
--------------------
  8333334166666.6667
(1 row)
Time: 3808.546 ms
After:
# SELECT var_samp(x) FROM test_int4;
       var_samp
--------------------
  8333334166666.6667
(1 row)
Time: 1033.243 ms
Andreas
| Attachment | Content-Type | Size | 
|---|---|---|
| int128-agg-v1.patch | text/x-patch | 29.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2014-10-25 15:12:42 | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} | 
| Previous Message | Ali Akbar | 2014-10-25 13:58:01 | Re: Function array_agg(array) |