From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improving and extending int128.h to more of numeric.c |
Date: | 2025-07-16 18:23:51 |
Message-ID: | CAEZATCWTqJN4Zg5xXffi7K3BU5i2SFoMnZJ9FHvzms7bPCnfrA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> > 0005 is the main patch. It adds a few more functions to int128.h and
> > uses them in numeric.c to allow various functions (mainly aggregate
> > functions) to use 128-bit integers unconditionally on all platforms.
> > This applies to the following aggregates:
> >
> > - sum(int8)
> > - avg(int8)
> > - stddev_pop(int4)
> > - stddev_samp(int4)
> > - var_pop(int4)
> > - var_samp(int4)
>
> > Testing on a 32-bit system without native int128 support, I see
> > something like a 1.3-1.5x speedup in a couple of simple queries using
> > those aggregates.
>
> With v5, I don't see any difference from master when building on x86
> with -m32 for these queries:
Thanks for testing!
> select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
> select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
The patch won't make any difference to those because "i" is numeric in
those queries, and the patch doesn't touch sum(numeric) or
var_pop(numeric).
> Which queries were you testing?
I used the following 2 queries:
SELECT count(*), sum(x), avg(x)
FROM generate_series(1::bigint, 10000000::bigint) g(x);
SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x)
FROM generate_series(1::int, 10000000::int) g(x);
On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
the following results:
Query 1:
HEAD: 1404.096 ms
Patch: 992.818 ms
Query 2:
HEAD: 1498.949 ms
Patch: 935.654 ms
And on a 32-bit Linux VM I got:
Query 1:
HEAD: 2465.202 ms
Patch: 1874.590 ms
Query 2:
HEAD: 2491.991 ms
Patch: 1682.992 ms
I didn't originally try "-m32" on 64-bit Linux because I wasn't sure
how realistic a test that would be, but doing that now I get:
Query 1:
HEAD: 1830.652 ms
Patch: 1411.438 ms
Query 2:
HEAD: 1882.299 ms
Patch: 1299.546 ms
> (Also, unrelated to the patch set, but I was surprised to find
> replacing the numeric expressions above with bigint ones
> (10_000_000_000 etc) makes the queries at least 5 times slower, and
> that's true with a normal 64-bit build as well.)
Hmm, are you sure? I don't see that. With -m32, I see:
select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
HEAD: 204.774 ms
Patch: 204.206 ms
(not expecting any difference)
select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
HEAD: 187.426 ms
Patch: 140.741 ms
(as expected, faster than the previous query in HEAD because bigint
generate_series should be faster than numeric generate_series, and
faster still with the sum(bigint) optimisations made by this patch)
select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
HEAD: 228.386 ms
Patch: 226.712 ms
(not expecting any difference)
select var_pop(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
HEAD: 211.749 ms
Patch: 210.870 ms
(as expected, faster than previous query because of bigint
generate_series, but the patch makes no difference because it doesn't
touch var_pop(bigint))
And another query:
select sum(i::bigint) from generate_series(1e10, 1e10+1e6, 1) i;
HEAD: 271.888 ms
Patch: 227.898 ms
(as expected, slower than the pure numeric version in HEAD because of
the cast, while still using numeric in the aggregate, but with a
decent speedup from the patch, using INT128 in the aggregate)
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-07-16 18:27:32 | Re: index prefetching |
Previous Message | Peter Geoghegan | 2025-07-16 18:18:54 | Re: index prefetching |