Improving and extending int128.h to more of numeric.c

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Improving and extending int128.h to more of numeric.c
Date: 2025-06-23 08:01:23
Message-ID: CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached are some improvements to include/common/int128.h, including
some new functions that allow it to be used more widely in numeric.c.

In particular, this allows various aggregates to use 128-bit integers
regardless of whether they're natively supported, which should improve
the performance on platforms lacking native 128-bit support, and it
also significantly simplifies a lot of numeric code, by making it the
same on all platforms.

0001 is a trivial bug fix for the test code in src/tools/testint128.c
-- it was using "union" instead of "struct" for test128.hl, which
meant that it was only ever setting and checking half of each 128-bit
integer in the tests.

0002 is a bit of preparatory refactoring of int128.h -- instead of
having all the native implementations at the top of the file, and the
non-native implementations at the bottom, this brings them together
(more like include/common/int.h). IMO, this makes it easier to work
on, since the native and non-native code is now adjacent inside each
function body, and it's not necessary to duplicate every function
comment and declaration, and it's easier to see that every function
has both implementations. Also, if we ever wanted to hand-code a
particular function to be the same on all platforms, it would be
easier with the file laid out this way. Although this means there are
now more #if's and #else's, it reduces the overall file size, and IMO
improves readability and maintainability.

0003 optimises the non-native addition code. Specifically, the test
for whether it needs to propagate a carry to the high part can be made
much simpler by noting that the low-part addition is unsigned integer
arithmetic, which is just modular arithmetic, so all it needs to do is
check for modular wrap-around, which can be done with a single "new <
old" test. In addition, it's possible to code this in a way that is
typically branchless, and produces the same machine code as the native
int128 code (e.g., an ADD and an ADC instruction). For me, this
significantly reduces the runtime of testint128 (from 31s to 16s).

0004 simplifies the non-native multiplication code a bit by using
signed integer multiplication for the first three product terms, which
simplifies the code needed to add the products to the result. Looking
on godbolt.org, this typically leads to significantly smaller output,
with less branching, though I found it only gave around a 3%
improvement to the runtime of testint128. Nonetheless, I still think
it's worth doing, to make the code simpler and more readable.

0005 is the main patch. It adds a few more functions to int128.h and
uses them in numeric.c to allow various functions (mainly aggregate
functions) to use 128-bit integers unconditionally on all platforms.
This applies to the following aggregates:

- sum(int8)
- avg(int8)
- stddev_pop(int4)
- stddev_samp(int4)
- var_pop(int4)
- var_samp(int4)

Excluding the new test code, 0005 gives a slight net reduction in the
total line count, and eliminates nearly all "#ifdef HAVE_INT128"
conditional code from numeric.c, making it significantly simpler and
easier to follow.

Testing on a 32-bit system without native int128 support, I see
something like a 1.3-1.5x speedup in a couple of simple queries using
those aggregates.

Regards,
Dean

Attachment Content-Type Size
v1-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patch text/x-patch 3.4 KB
v1-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patch text/x-patch 5.4 KB
v1-0003-Optimise-non-native-128-bit-addition-in-int128.h.patch text/x-patch 3.3 KB
v1-0005-Extend-int128.h-to-support-more-numeric-code.patch text/x-patch 38.0 KB
v1-0001-Fix-incorrectly-defined-test128-union-in-testint1.patch text/x-patch 813 bytes

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-06-23 08:10:37 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Jelte Fennema-Nio 2025-06-23 07:56:19 Re: Per-role disabling of LEAKPROOF requirements for row-level security?