| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Small and unlikely overflow hazard in bms_next_member() |
| Date: | 2026-04-13 00:23:26 |
| Message-ID: | 5E98992A-62D3-4961-B2F1-61246DB9D814@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Apr 12, 2026, at 21:17, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 3 Apr 2026 at 15:08, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> IMO, if we can make bitmapset.c work with INT_MAX members and get a
>> performance increase, then we should do it.
>
> Re-thinking this after a week's holiday, it seems fine to use an
> unsigned 32-bit int rather than a 64-bit int to fix this bug. I'd
> previously been uncertain if there were any guarantees in C to what
> (unsigned int) -1 would return, but going by [1] at 6.3.1.3, it says:
>
> "Otherwise, if the new type is unsigned, the value is converted by
> repeatedly adding or subtracting one more than the maximum value that
> can be represented in the new type until the value is in the range of
> the new type."
>
> So, it seems even on one's complement that -1 as an unsigned int will
> be UINT_MAX. When we add 1 to UINT_MAX, we're guaranteed to get 0, as
> it's unsigned maths and overflows are going to result in a value
> modulus the max value for the type.
>
> That leads me to the attached v2 patch. Compiler Explorer link showing
> the assembly at [2].
>
> Testing the performance, it's better than master, so I got rid of the
> size_t wordnum stuff. We're post-freeze now, so fiddling with other
> optimisations seems a bit off the table as there appears to be no
> performance regression to compensate for now, per:
>
> drowley(at)amd3990x:~$ gcc test_bms_next3.c -O2 -o test_bms_next3 &&
> ./test_bms_next3
> Benchmarking 100000000 bms_next_member iterations...
> master: 1.18330 seconds
> Patched: 1.05493 seconds
>
> Benchmarking 100000000 bms_prev_member iterations...
> master: 2.94522 seconds
> Patched: 1.86130 seconds
>
> drowley(at)amd3990x:~$ clang test_bms_next3.c -O2 -o test_bms_next3 &&
> ./test_bms_next3
> Benchmarking 100000000 bms_next_member iterations...
> master: 1.07860 seconds
> Patched: 1.07896 seconds
>
> Benchmarking 100000000 bms_prev_member iterations...
> master: 2.76550 seconds
> Patched: 2.12369 seconds
>
> David
>
> [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> [2] https://godbolt.org/z/xW96rxd3P
> <test_bms_next3.c><v2-0001-Fix-unlikely-overflow-bug-in-bms_next_member.patch>
I just tried test_bms_next3 on my MacBook, looks like patched bms_prev_member is much faster there. I ran about 10 times, the results were consistent.
```
chaol(at)ChaodeMacBook-Air test % gcc test_bms_next3.c -O2 -o test_bms_next3
test_bms_next3.c:281:18: warning: format specifies type 'long' but the argument has type 'int64' (aka 'long long') [-Wformat]
281 | printf("%ld\n", count);
| ~~~ ^~~~~
| %lld
1 warning generated.
chaol(at)ChaodeMacBook-Air test % ./test_bms_next3
Benchmarking 100000000 bms_next_member iterations...
master: 0.49287 seconds
Patched: 0.47315 seconds
Benchmarking 100000000 bms_prev_member iterations...
master: 1.04644 seconds
Patched: 0.58053 seconds
2800000000
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-04-13 00:37:55 | test_compression, test module for low-level compression APIs (for 2b5ba2a0a141) |
| Previous Message | Michael Paquier | 2026-04-13 00:16:29 | Re: Fix pgstat_database.c to honor passed database OIDs |