Making empty Bitmapsets always be NULL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Making empty Bitmapsets always be NULL
Date: 2023-02-28 21:59:48
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When I designed the Bitmapset module, I set things up so that an empty
Bitmapset could be represented either by a NULL pointer, or by an
allocated object all of whose bits are zero. I've recently come to
the conclusion that that was a bad idea and we should instead have
a convention like the longstanding invariant for Lists: an empty
list is represented by NIL and nothing else.

To do this, we need to fix bms_intersect, bms_difference, and a couple
of other functions to check for having produced an empty result; but
then we can replace bms_is_empty() by a simple NULL test. I originally
guessed that that would be a bad tradeoff, but now I think it likely
is a win performance-wise, because we call bms_is_empty() many more
times than those other functions put together.

However, any performance gain would likely be marginal; the real
reason why I'm pushing this is that we have various places that have
hand-implemented a rule about "this Bitmapset variable must be exactly
NULL if empty", so that they can use checks-for-null in place of
bms_is_empty() calls in particularly hot code paths. That is a really
fragile, mistake-prone way to do things, and I'm surprised that we've
seldom been bitten by it. It's not well documented at all which
variables have this property, so you can't readily tell which code
might be violating those conventions.

So basically I'd like to establish that convention everywhere and
get rid of these ad-hoc reduce-to-NULL checks. I put together the
attached draft patch to do so. I've not done any hard performance
testing on it --- I did do one benchmark that showed maybe 0.8%
speedup, but I'd regard that as below the noise.

I found just a few places that have issues with this idea. One thing
that is problematic is bms_first_member(): assuming you allow it to
loop to completion, it ends with the passed Bitmapset being empty,
which is now an invariant violation. I made it pfree the argument
at that point, and fixed a couple of callers that would be broken
thereby; but I wonder if it wouldn't be better to get rid of that
function entirely and convert all its callers to use bms_next_member.
There are only about half a dozen.

I also discovered that nodeAppend.c is relying on bms_del_members
not reducing a non-empty set to NULL, because it uses the nullness
of appendstate->as_valid_subplans as a state boolean. That was
probably acceptable when it was written, but whoever added
classify_matching_subplans made a hash of the state invariants here,
because that can set as_valid_subplans to empty. I added a separate
boolean as an easy way out, but maybe that code could do with a more
thorough revisit.

I'll add this to the about-to-start CF.

regards, tom lane

Attachment Content-Type Size
v1-make-empty-bitmapsets-always-be-NULL.patch text/x-diff 21.6 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Sandro Santilli 2023-02-28 22:13:55 Re: Ability to reference other extensions by schema in extension scripts
Previous Message Alexander Korotkov 2023-02-28 21:08:58 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()