Re: [PATCH] Add tests for Bitmapset

From: Greg Burd <greg(at)burd(dot)me>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PATCH] Add tests for Bitmapset
Date: 2025-09-19 18:48:43
Message-ID: CD368B71-2947-407D-8967-181A4CB4A985@greg.burd.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Sep 19 2025, at 2:36 am, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Sep 18, 2025 at 02:50:45PM -0400, Greg Burd wrote:
>> Thanks for all the feedback and time spent reviewing this patch. I
>> switched out the encode/decode functions to use the nodeToString() and
>> stringToNode() functions and change all the SQL testing function
>> signatures to TEXT from BYTEA. This exercises more code and that's good
>> for testing purposes. I took out the code that checks the hash values,
>> I can't think of a reason that should cause a future failure should that
>> change and output different values. I re-integrated the earlier random
>> testing function along with the newer code. I'm not sure this buys us
>> much if anything.
>>
>> coverage: HEAD v7
>> lines......: 87.1 90.5 +3.4%
>> functions..: 100.0 100.0 +0.0%
>> branches...: 63.5 75.9 +12.4%
>
> Nathan has given me his blessing, so as I could look at this patch and
> put my hands on it. I have spotted one bug, and things that could be
> done a bit better. Please see below.

Hey Michael,

Thank you for taking the initiative and reviewing the code.

> As far as I am concerned there is a mistake with bms_overlap_list().
> This API checks a Bitmapset with a list of integers, but you have
> coded things so as it checks for a list of Bitmapset. I think that
> test_bms_overlap_list() should take in input a int4 array, then the
> code in charge of the conversion from the array to the list needs to
> be tweaked a little bit. With the current code, we could get random
> failures with negative members included in a Bitmapset, depending on
> what's on the stack.

Apologies, you're right. I'm not sure how I went astray on that, but
it's fixed now.

> I am not really convinced about the value brought by bms_values() now
> that the output is generated for all the SQL functions using the
> "decode" and "encode" routines (which may be actually less confusing
> if renamed as "bms_to_text" and "text_to_bms", but that's a personal
> take), because this results in doing an extra round-trip between text
> and Bitmapset.

Agreed, renamed and removed. New macros are:

/* Encode/Decode to/from TEXT and Bitmapset */
#define BITMAPSET_TO_TEXT(bms) (text *) CStringGetTextDatum(nodeToString((bms)))
#define TEXT_TO_BITMAPSET(str) (Bitmapset *)
stringToNode(TextDatumGetCString((Datum) (str)))

> The tests don't rely much on NULL as input to
> translate that as "(b)". bmsToString() is just a direct call to what
> the decode and encode routines do.

Yes, agreed.

> Same remark about test_bms_from_array(), that mimicks nodeRead(), in
> charge of doing a int4[] -> Bitmapset conversion. Wouldn't it be
> simpler to just use "(b 1 2 3)" in input and let the decode/encode
> routines do the job?

That makes a lot of sense now that things are all string to/from the
Bitmapset node representation. Done.

> Let's replace the test descriptions in the queries that we know have
> an individual result by comments. For example all the "Range
> operations". These descriptions are useful in the UNION queries,
> where we want to rely on a set of Bitmapsets for multiple operations,
> to be able to know the description of the result. For individual
> queries, this bloats the output.

Okay, done.

> The numbering in the tests are not that useful to have. If the test
> suite is updated, that would mean more updates required if we add a
> new block in the middle.

Agreed, removed.

> +SELECT 'overlapping ranges' as test,
> + bms_values(test_bms_union(
> + test_bms_add_range(NULL, 50, 150),
> + test_bms_add_range(NULL, 100, 200)
>
> The output generated by this query leads to a bitmapset with 200
> members. It would be simpler and more readable to reduce these
> ranges, or is there a reason for these to be large? Same remark for
> "difference subtraction edge case" with its 50 members.

Some of these were targeted at corner cases to increase coverage. I'll
review and consider ways to either a) eliminate them, or b) shorten the output.

> +SELECT 'add_range large range' as test,
> bms_values(test_bms_add_range(NULL, 0, 1000)) as result;
>
> This one also is funky with its output of 1000 elements. Any reason
> this is justified in terms of coverage? If yes, we could use some
> substr() calls to get the end and the beginning of the output
> generated, perhaps to reduce the output of the whole, or the length,
> or something like that relevant enough to validate the output. There
> is a second test a bit down that uses a range of [1000,1100], as well.

I used length() and recorded the expected output in the test. That
seems to a) reduce noise in the output and b) remain deterministic(-ish).

> --
> Michael

I think I've incorporated your feedback which did indeed make the patch
more readable/crisp and maintainable, thank you.

coverage: HEAD v7 v8
lines......: 87.1 90.5 92.2
functions..: 100.0 100.0 100.0
branches...: 63.5 75.9 76.2

best.

-greg

Attachment Content-Type Size
v8-0001-Add-a-module-that-tests-Bitmapset.patch application/octet-stream 60.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-09-19 19:58:53 Re: allow benign typedef redefinitions (C11)
Previous Message Matthias van de Meent 2025-09-19 17:28:42 Re: Unnecessary calculations in Execproject