Re: Making empty Bitmapsets always be NULL

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Making empty Bitmapsets always be NULL
Date: 2023-03-07 04:06:42
Message-ID: CAApHDvq9eq0W_aFUGrb6ba28ieuQN4zM5Uwqxy7+LMZjJc+VGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 4 Mar 2023 at 11:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > On Fri, 3 Mar 2023 at 15:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > It's true that the majority of Bitmapsets are going to be just 1 word,
> > but it's important to acknowledge that we do suffer in some more
> > extreme cases when Bitmapsets become large. Partition with large
> > numbers of partitions is one such case.
>
> Maybe, but optimizing for that while pessimizing every other case
> doesn't sound very attractive from here. I think we need some
> benchmarks on normal-size bitmapsets before considering doing much
> in this area.

After thinking about this again and looking at the code, I'm not
really sure where the pessimism has been added. For the changes made
to say bms_equal(), there was already a branch that checked the nwords
column so that we could find the shorter and longer sets out of the
two input sets. That's been replaced with a comparison of both input
set's nwords, which does not really seem any more expensive. For
bms_compare() we needed to do Min(a->nwords, b->nwords) to find the
shortest set, likewise for bms_nonempty_difference() and
bms_is_subset(). That does not seem less expensive than the
replacement code.

I think the times where we have sets that we do manage to trim down
the nword count with that we actually end up having to expand again
are likely fairly rare.

I also wondered if we could shave off a few instructions by utilising
the knowledge that nwords is never 0. That would mean that some of
the loops could be written as:

i = 0; do { <stuff>; } while (++i < set->nwords);

instead of:

for (i = 0; i < set->nwords; i++) { <stuff>; }

if we do assume that the vast majority of sets are nwords==1 sets,
then this reduces the loop condition checks by half for all those.

I see that gcc manages to optimize: for (i = 0; i < set->nwords || i
== 0; i++) { <stuff>; } into the same code as the do while loop. clang
does not seem to manage that.

> Also, if we're going to make any sort of changes here it'd probably
> behoove us to make struct Bitmapset private in bitmapset.c, so that
> we can have confidence that nobody is playing games with them.
> I had a go at that and was pleasantly surprised to find that
> actually nobody has; the attached passes check-world. It'd probably
> be smart to commit this as a follow-on to 00b41463c, whether or not
> we go any further.

That seems like a good idea. This will give us extra reassurance that
nobody is making up their own Bitmapsets through some custom function
that don't follow the rules.

> Also, given that we do this, I don't think that check_bitmapset_invariants
> as you propose it is worth the trouble.

I wondered if maybe just Assert(a == NULL || a->words[a->nwords - 1]
!= 0); would be worthwhile anywhere. However, I don't see any
particular function that is more likely to catch those errors, so it's
maybe not worth doing anywhere if we're not doing it everywhere.

I adjusted the patch to remove the invariant checks and fixed up a
couple of things I'd missed. The 0002 patch changes the for loops
into do while loops. I wanted to see if we could see any performance
gains from doing this.

The performance numbers are nowhere near as stable as I'd like them to
have been, but testing the patch shows:

Test 1:

setup:
create table t1 (a int) partition by list(a);
select 'create table t1_'||x||' partition of t1 for values
in('||x||');' from generate_series(0,9)x;
\gexec

Test 1's sql:
select * from t1 where a > 1 and a < 3;

for i in {1..3}; do pgbench -n -f test1.sql -T 15 postgres | grep tps; done

master (cf96907aad):
tps = 29534.189309
tps = 30465.722545
tps = 30328.290553

master + 0001:
tps = 28915.174536
tps = 29817.950994
tps = 29387.084581

master + 0001 + 0002:
tps = 29438.216512
tps = 29951.905408
tps = 31445.191414

Test 2:

setup:
create table t2 (a int) partition by list(a);
select 'create table t2_'||x||' partition of t2 for values
in('||x||');' from generate_series(0,9999)x;
\gexec

Test 2's sql:
select * from t2 where a > 1 and a < 3;

for i in {1..3}; do pgbench -n -f test2.sql -T 15 postgres | grep tps; done

master (cf96907aad):
tps = 28470.504990
tps = 29175.450905
tps = 28123.699176

master + 0001:
tps = 28056.256805
tps = 28380.401746
tps = 28384.395217

master + 0001 + 0002:
tps = 29365.992219
tps = 28418.374923
tps = 28303.924129

Test 3:

setup:
create table t3a (a int primary key);
create table t3b (a int primary key);

Test 3's sql:
select * from t3a inner join t3b on t3a.a = t3b.a;

for i in {1..3}; do pgbench -n -f test3.sql -T 15 postgres | grep tps; done

master (cf96907aad):
tps = 20458.710550
tps = 20527.898929
tps = 20284.165277

master + 0001:
tps = 20700.340713
tps = 20571.913956
tps = 20541.771589

master + 0001 + 0002:
tps = 20046.674601
tps = 20016.649536
tps = 19487.999853

I've attached a graph of this too. It shows that there might be a
small increase in performance with tests 1 and 2. It seems like test 3
regresses a bit. I suspect this might just be a code arrangement issue
as master + 0001 is faster than 0001 + 0002 for that test.

David

Attachment Content-Type Size
v1-0001-Remove-trailing-zero-words-from-Bitmapsets.patch application/octet-stream 9.7 KB
v1-0002-Use-do-while-loops-instead-of-for-loops.patch application/octet-stream 8.9 KB
image/png 59.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-03-07 04:23:00 RE: [Proposal] Add foreign-server health checks infrastructure
Previous Message shiy.fnst@fujitsu.com 2023-03-07 03:46:48 RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher