Re: Sparse bit set data structure

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Sparse bit set data structure
Date: 2019-03-20 16:20:11
Message-ID: CAOBaU_bVYKYXSyXB5_ub-Nue52=XCmVtMYFuAVXJ8L0TT7V_ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2019 at 2:10 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 14/03/2019 17:37, Julien Rouhaud wrote:
>
> > + if (newitem <= sbs->last_item)
> > + elog(ERROR, "cannot insert to sparse bitset out of order");
> >
> > Is there any reason to disallow inserting duplicates? AFAICT nothing
> > prevents that in the current code. If that's intended, that probably
> > should be documented.
>
> Yeah, we could easily allow setting the last item again. It would be a
> no-op, though, which doesn't seem very useful. It would be useful to
> lift the limitation that the values have to be added in ascending order,
> but current users that we're thinking of don't need it. Let's add that
> later, if the need arises.
>
> Or did you mean that the structure would be a "bag" rather than a "set",
> so that it would keep the duplicates? I don't think that would be good.
> I guess the vacuum code that this will be used in wouldn't care either
> way, but "set" seems like a more clean concept.

Yes, I was thinking about "bag". For a set, allowing inserting
duplicates is indeed a no-op and should be pretty cheap with almost no
extra code for that. Maybe VACUUM can't have duplicate, but is it
that unlikely that other would need it? I'm wondering if just
requiring to merge multiple such structure isn't going to be needed
soon for instance. If that's not wanted, I'm still thinking that a
less ambiguous error should be raised.

> I'm now pretty satisfied with this. Barring objections, I'll commit this
> in the next few days. Please review, if you have a chance.

You're defining SIMPLE8B_MAX_VALUE but never use it. Maybe you wanted
to add an assert / explicit test in intset_add_member()?

/*
* We buffer insertions in a simple array, before packing and inserting them
* into the B-tree. MAX_BUFFERED_VALUES sets the size of the buffer. The
* encoder assumes that it is large enough, that we can always fill a leaf
* item with buffered new items. In other words, MAX_BUFFERED_VALUES must be
* larger than MAX_VALUES_PER_LEAF_ITEM.
*/
#define MAX_BUFFERED_VALUES (MAX_VALUES_PER_LEAF_ITEM * 2)

The *2 is not immediately obvious here (at least it wasn't to me),
maybe explaining intset_flush_buffered_values() main loop rationale
here could be worthwhile.

Otherwise, everything looks just fine!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-20 16:43:11 Re: Automated way to find actual COMMIT LSN of subxact LSN
Previous Message Peter Eisentraut 2019-03-20 16:02:34 Re: pg_basebackup ignores the existing data directory permissions