Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-11-08 18:35:57
Message-ID: CAH2-Wzna4A3ZCdq2F+rqCZk9kX9ubPWbsPs2QYg-EtzQ7HU0Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 4, 2019 at 11:52 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v21, which fixes some bitrot -- v20 of the patch was made
> totally unusable by today's commit 8557a6f1. Other changes:

There is more bitrot, so I attach v22. This also has some new changes
centered around fixing particular issues with space utilization. These
changes are:

* nbtsort.c now intelligently considers the contribution of suffix
truncation of posting list tuples when considering whether or not a
leaf page is "full". I mean "full" in the sense that it has exceeded
the soft limit (fillfactor-wise limit) on space utilization for the
page (no change in how the hard limit in _bt_buildadd() works).

We don't usually bother predicting the space saving from suffix
truncation when considering split points, even in nbtsplitloc.c, but
it's worth making an exception for posting lists (actually, this is
the same exception that nbtsplitloc.c already had in much earlier
versions of the patch). Posting lists are very often large enough to
really make a big contribution to how balanced free space is. I now
observe that weird cases where CREATE INDEX packs leaf pages too empty
(or too full) are now all but eliminated. CREATE INDEX now does a
pretty good job of respecting leaf fillfactor, while also allowing
deduplication to be very effective (CREATE INDEX did neither of these
two things in earlier versions of the patch).

* Added "single value" strategy for retail insert deduplication --
this is closely related to nbtsplitloc.c's single value strategy.

The general idea is that _bt_dedup_one_page() anticipates that a
future "single value" page split is likely to occur, and therefore
limits deduplication after two "1/3 of a page"-wide posting lists at
the start of the page. It arranges for deduplication to leave a neat
split point for nbtsplitloc.c to use when the time comes. In other
words, the patch now allows "single value" page splits to leave leaf
pages BTREE_SINGLEVAL_FILLFACTOR% full, just like v12/master. Leaving
a small amount of free space on pages that are packed full of
duplicates is always a good idea. Also, we no longer force page splits
to leave pages 2/3 full (only two large posting lists plus a high
key), which sometimes happened with v21. On balance, this change seems
to slightly improve space utilization.

In general, it's now unusual for retail insertions to get better space
utilization than CREATE INDEX -- in that sense normality/balance has
been restored in v22. Actually, I wrote the v22 changes by working
through a list of weird space utilization issues from my personal
notes. I'm pretty sure I've fixed all of those -- only nbtsplitloc.c's
single value strategy wants to split at a point that leaves a heap TID
in the new high key for the page, so that's the only thing we need to
worry about within nbtdedup.c.

* "deduplication" storage parameter now has psql completion.

I intend to push the datum_image_eq() preparatory patch soon. I will
also push a commit that makes _bt_keep_natts_fast() use
datum_image_eq() separately. Anybody have an opinion on that?

--
Peter Geoghegan

Attachment Content-Type Size
v22-0001-Teach-datum_image_eq-about-cstring-datums.patch application/octet-stream 2.0 KB
v22-0003-DEBUG-Add-pageinspect-instrumentation.patch application/octet-stream 8.6 KB
v22-0002-Add-deduplication-to-nbtree.patch application/octet-stream 168.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-11-08 18:52:46 Re: errbacktrace
Previous Message Ranier VF 2019-11-08 17:41:40 Patch avoid call strlen repeatedly in loop.