Sparse bit set data structure

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Sparse bit set data structure
Date: 2019-03-13 19:18:12
Message-ID: b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I was reviewing Andrey Borodin's patch for GiST VACUUM [1], which
includes a new "block set" data structure, to track internal and empty
pages while vacuuming a GiST. The blockset data structure was a pretty
simple radix tree, that can hold a set of BlockNumbers.

The memory usage of the radix tree would probably be good enough in real
life, as we also discussed on the thread. Nevertheless, I was somewhat
bothered by it, so I did some measurements. I added some
MemoryContextStats() calls to Andrey's test_blockset module, to print
out memory usage.

For storing 5000000 random 32-bit integers, or a density of about 1% of
bits set, the blockset consumed about 380 MB of memory. I think that's a
pretty realistic ratio of internal pages : leaf pages on a GiST index,
so I would hope the blockset to be efficient in that ballpark. However,
380 MB / 5000000 is about 76 bytes, so it's consuming about 76 bytes per
stored block number. That's a lot of overhead! For comparison, a plain
BlockNumber is just 4 bytes. With more sparse sets, it is even more
wasteful, on a per-item basis, although the total memory usage will of
course be smaller. (To be clear, no one is pushing around GiST indexes
with anywhere near 2^32 blocks, or 32 TB, but the per-BlockNumber stats
are nevertheless interesting.)

I started to consider rewriting the data structure into something more
like B-tree. Then I remembered that I wrote a data structure pretty much
like that last year already! We discussed that on the "Vacuum: allow
usage of more than 1GB of work mem" thread [2], to replace the current
huge array that holds the dead TIDs during vacuum.

So I dusted off that patch, and made it more general, so that it can be
used to store arbitrary 64-bit integers, rather than ItemPointers or
BlockNumbers. I then added a rudimentary form of compression to the leaf
pages, so that clusters of nearby values can be stored as an array of
32-bit integers, or as a bitmap. That would perhaps be overkill, if it
was just to conserve some memory in GiST vacuum, but I think this will
turn out to be a useful general-purpose facility.

I plugged this new "sparse bitset" implementation into the same
test_blockset test. The memory usage for 5000000 values is now just over
20 MB, or about 4.3 bytes per value. That's much more reasonable than
the 76 bytes.

I'll do some more performance testing on this, to make sure it performs
well enough on random lookups, to also replace VACUUM's dead item
pointer array. Assuming that works out, I plan to polish up and commit
this, and use it in the GiST vacuum. I'm also tempted to change VACUUM
to use this, since that should be pretty straightforward once we have
the data structure.

[1]
https://www.postgresql.org/message-id/A51F64E3-850D-4249-814E-54967103A859%40yandex-team.ru

[2]
https://www.postgresql.org/message-id/8e5cbf08-5dd8-466d-9271-562fc65f133f%40iki.fi

- Heikki

Attachment Content-Type Size
0001-Add-SparseBitset-to-hold-a-large-set-of-64-bit-ints-.patch text/x-patch 27.8 KB
0002-Andrey-Borodin-s-test_blockset-tool-adapted-for-Spar.patch text/x-patch 12.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-13 19:47:15 Re: hyrax vs. RelationBuildPartitionDesc
Previous Message Tom Lane 2019-03-13 19:14:52 Re: hyrax vs. RelationBuildPartitionDesc