Re: [PATCH] Add tests for Bitmapset

From: Greg Burd <greg(at)burd(dot)me>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Add tests for Bitmapset
Date: 2025-09-17 13:17:59
Message-ID: 4792B7EE-F449-47AD-9078-C827D43F1EEE@greg.burd.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Sep 16 2025, at 11:25 pm, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Sep 16, 2025 at 03:04:39PM -0400, Greg Burd wrote:
>> I've re-written the set of tests as suggested (I hope!). I've not
>> re-run the coverage report (yet) to ensure we're at least as well
>> covered as v3, but attached is v4 for your inspection (amusement?).
>> I've tried to author the SQL tests such that failures clearly indicate
>> what's at gone wrong.
>
> Thanks for the update, that's easier to follow, even if it's more
> code overall. It looks like all the 29 APIs are present, plus the
> extras for the array mapping routines required for the tests with the
> list APIs.

Thank you for the continued detailed feedback, I think we're zeroing in
on a solid patch.

> +static void
> +elog_bitmapset(int elevel, const char *label, const Bitmapset *bms)
>
> This routine, that's able to translate a bytea into a readable form,
> is also something that I would have added as an extra function, then
> used it in some of the tests where I would care about the bitmap in
> output generated after a given operation, removing the DEBUG parts
> that I guess have also a risk of rotting over time. That would make
> the tests for simple operations that generate a bms as a result in
> easier to read, because we would call less SQL calls. For example,
> take the "add_member idempotent" case, that calls twice
> test_bms_make_singleton() to compare the output byteas, we could just
> do a test_bms_add_member(test_bms_make_singleton(10), 10) and show the
> output?

I added a function bms_values() that prints out the values contained in
a bitmapset and changed a few tests to use it. I removed the
elog_bitmapset() as it isn't being used and now that it's easy to output
the set in SQL why keep it for debugging?

> We could perhaps use the CTE style for the basic calls, to reduce the
> test_bms_make_singleton() calls. What you are doing feels OK as well,
> just a thought in passing.

I changed a few tests I saw with the pattern you mentioned. If you identify
additional tests you feel should be re-written please let me know.

> FWIW, I tend to write the regression test queries so as these are
> limited to 72-80 characters per line, to fit even in a small
> terminals. That's just an old-school take, even if it means more
> lines in the SQL input file.

I have a vt-100 sitting behind me, I get it. I tend to try to line-wrap
at 78. I tried a few times to re-format the SQL accordingly and I found
that it became harder to understand and less readable IMO. I'd prefer to
keep it as is, unless there is a hard requirement in which case I'll
reformat it.

> Perhaps it would be worth documenting what's the value of
> test_random_operations? It is a mean of testing add, del and
> is_member with a modulo 1000 of the member number. With the other
> functions that offer deterministic tests, I am not sure what's the
> extra value gained in it.

That was my thinking and why I didn't have a test that used random data
to start with (v1). I'd rather just remove it, but I'm open to ideas.
I could replace it with the earlier version of the function.

> + bms1_data = PG_GETARG_BYTEA_PP(0);
> + if (VARSIZE_ANY_EXHDR(bms1_data) > 0)
> + {
> + bms1 = (Bitmapset *) palloc(VARSIZE_ANY_EXHDR(bms1_data));
> + memcpy(bms1, VARDATA_ANY(bms1_data), VARSIZE_ANY_EXHDR(bms1_data));
> + }
>
> This pattern is repeated 9 times, if my fingers got it right. Perhaps
> hide it in a macro, or invent an extra static inline routine that does
> the translation?

That was an oversight on my part, all of those have been reduced to use
the same pattern with encode/decode bms() functions.

> +DO $$
> +BEGIN
> + BEGIN
> + PERFORM test_bms_singleton_member(test_bms_from_array(ARRAY[1,2]));
> + RAISE NOTICE 'ERROR: singleton_member([1,2]) should have failed!';
>
> Do we need all these DO blocks to catch failures? These cases bump on
> elog(ERROR) in the internals of bitmapset.c, which are not something
> the end-users would expect, but in a test module that's fine to
> trigger. The function call would just fail, which is fine.

Sure, that's easy enough.

> The queries with CTEs and UNION ALL are elegant, nice.

Thanks.

>> This exercise turned into a lot more LOC than I'd expected, I hope it
>> provides some value to the community for testing this important data
>> structure and helps to codify the API clearly.
>
> It does, at least that's my opinion. So thanks for caring about this
> level of testing.

Again, thank YOU(!) for caring enough to review the work.

> --
> Michael

Don't forget, we expect the cfbot/CI to fail on 32bit systems and tell
us the hash value we need to record in the next version of the patch.
Or, if we don't think testing the hash functions is worth it (and I'm on
the fence on this one) then I can just remove it.

best.

-greg

Attachment Content-Type Size
v5-0001-Add-a-module-that-tests-Bitmapset.patch application/octet-stream 69.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2025-09-17 13:18:23 Re: REPACK and naming
Previous Message David G. Johnston 2025-09-17 13:17:56 REPACK and naming