Re: Sparse bit set data structure

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Sparse bit set data structure
Date: 2019-03-14 08:42:49
Message-ID: dbf83e18-a3f8-145e-1d28-d7e07cb391a2@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/03/2019 07:15, Andrey Borodin wrote:
>> 14 марта 2019 г., в 0:18, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
>> <0001-Add-SparseBitset-to-hold-a-large-set-of-64-bit-ints-.patch><0002-Andrey-Borodin-s-test_blockset-tool-adapted-for-Spar.patch>
>
> That is very interesting idea. Basically, B-tree and radix tree is a tradeoff between space and time.
>
> In general, lookup into radix tree will touch less CPU cache lines.
> In this terms Bitmapset is on most performant and memory-wasteful side: lookup into Bitmapset is always 1 cache line.
> Performance of radix tree can be good in case of skewed distribution, while B-tree will be OK on uniform. I think that distribution of GiST inner pages is uniform, distribution of empty leafs is... I have no idea, let's consider uniform too.

Yeah. In this implementation, the leaf nodes are packed into bitmaps
when possible, so it should perform quite well on skewed distributions, too.

> I'd review this data structure ASAP. I just need to understand: do we aim to v12 or v13? (I did not solve concurrency issues in GiST VACUUM yet, but will fix them at weekend)

I'm aiming v12 with this. It's a fairly large patch, but it's very
isolated. I think the most pressing issue is getting the rest of the
GiST vacuum patch fixed. If you get that fixed over the weekend, I'll
take another look at it on Monday.

> Also, maybe we should consider using RoaringBitmaps? [0]
> As a side not I would add that while balanced trees are widely used for operations on external memory, there are more performant versions for main memory. Like AVL-tree and RB-tree.

Hmm. Yeah, this is quite similar to Roaring Bitmaps. Roaring bitmaps
also have a top-level, at which you binary search, and "leaf" nodes
which can be bitmaps or arrays. In a roaring bitmap, the key space is
divided into fixed-size chunks, like in a radix tree, but different from
a B-tree.

Even if we used AVL-trees or RB-trees or something else for the top
layers of the tree, I think at the bottom level, we'd still need to use
sorted arrays or bitmaps, to get the density we want. So I think the
implementation at the leaf level would look pretty much the same, in any
case. And the upper levels don't take very much space, regardless of the
implementation. So I don't think it matters much.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-03-14 08:47:23 RE: Timeout parameters
Previous Message Imai, Yoshikazu 2019-03-14 08:35:09 RE: speeding up planning with partitions