Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-11-04 19:52:14
Message-ID: CAH2-WznUHDAqUwUMya_KCJ6rNBuG3XZfU-rpyUM-LwO_BUhS2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 30, 2019 at 7:39 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v20, which adds a custom strategy for the checkingunique
> (unique index) case to _bt_dedup_one_page(). It also makes
> deduplication the default for both unique and non-unique indexes. I
> simply altered your new BtreeDefaultDoDedup() macro from v19 to make
> nbtree use deduplication wherever it is safe to do so. This default
> may not be the best one in the end, though deduplication in unique
> indexes now looks very compelling.

Attached is v21, which fixes some bitrot -- v20 of the patch was made
totally unusable by today's commit 8557a6f1. Other changes:

* New datum_image_eq() patch fixes up datum_image_eq() to work with
cstring/name columns, which we rely on. No need for a Valgrind
suppressions anymore. The suppression was only needed to paper over
the fact that datum_image_eq() would not really work properly with
cstring datums (the suppression was papering over a legitimate
complaint, but we fix the underlying problem with 8557a6f1 and the
v21-0001-* patch).

* New nbtdedup.c file added. This has all of the functions that dealt
with deduplication and posting lists that were previously in
nbtinsert.c and nbtutils.c. I think that this separation is somewhat
cleaner.

* Additional tweaks to the custom checkingunique algorithm used by
deduplication. This is based on further tuning from benchmarking. This
is certainly not final yet.

* Greatly simplified the code for unique index LP_DEAD killing in
_bt_check_unique(). This was pretty sloppy in v20 of the patch (it had
two "goto" labels). Now it works with the existing loop conditions
that advance to the next equal item on the page.

* Additional adjustments to the nbtree.h comments about the on-disk format.

Can you take a quick look at the first patch (the v21-0001-* patch),
Anastasia? I would like to get that one out of the way soon.

--
Peter Geoghegan

Attachment Content-Type Size
v21-0001-Teach-datum_image_eq-about-cstring-datums.patch application/x-patch 2.0 KB
v21-0003-DEBUG-Add-pageinspect-instrumentation.patch application/x-patch 8.6 KB
v21-0002-Add-deduplication-to-nbtree.patch application/x-patch 158.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-11-04 19:58:20 Re: Missed check for too-many-children in bgworker spawning
Previous Message Tomas Vondra 2019-11-04 19:44:55 Re: 64 bit transaction id