Re: Sparse bit set data structure

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Sparse bit set data structure
Date: 2019-03-14 15:37:16
Message-ID: CAOBaU_ZS9bPogDHPs8KhP+vQNZayY2Z5rOc5PdP4R0zRwnZ6Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 13, 2019 at 8:18 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I started to consider rewriting the data structure into something more
> like B-tree. Then I remembered that I wrote a data structure pretty much
> like that last year already! We discussed that on the "Vacuum: allow
> usage of more than 1GB of work mem" thread [2], to replace the current
> huge array that holds the dead TIDs during vacuum.
>
> So I dusted off that patch, and made it more general, so that it can be
> used to store arbitrary 64-bit integers, rather than ItemPointers or
> BlockNumbers. I then added a rudimentary form of compression to the leaf
> pages, so that clusters of nearby values can be stored as an array of
> 32-bit integers, or as a bitmap. That would perhaps be overkill, if it
> was just to conserve some memory in GiST vacuum, but I think this will
> turn out to be a useful general-purpose facility.

I had a quick look at it, so I thought first comments could be helpful.

+ * If you change this, you must recalculate MAX_INTERVAL_LEVELS, too!
+ * MAX_INTERNAL_ITEMS ^ MAX_INTERNAL_LEVELS >= 2^64.

I think that MAX_INTERVAL_LEVELS was a typo for MAX_INTERNAL_LEVELS,
which has probably been renamed to MAX_TREE_LEVELS in this patch.

+ * with varying levels of "compression". Which one is used depending on the
+ * values stored.

depends on?

+ if (newitem <= sbs->last_item)
+ elog(ERROR, "cannot insert to sparse bitset out of order");

Is there any reason to disallow inserting duplicates? AFAICT nothing
prevents that in the current code. If that's intended, that probably
should be documented.

Nothing struck me other than that, that's a pretty nice new lib :)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2019-03-14 15:54:34 Re: Offline enabling/disabling of data checksums
Previous Message Michael Banck 2019-03-14 15:26:20 Re: Offline enabling/disabling of data checksums