Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-09-09 19:54:08
Message-ID: CAH2-Wzm=9TnAFGCDfvsBVC5zYonQqeLMmYpnx=xZ3nyXOeHjNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 2, 2019 at 6:53 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attach is v10, which fixes the Valgrind issue.

Attached is v11, which makes the kill_prior_tuple optimization work
with posting list tuples. The only catch is that it can only work when
all "logical tuples" within a posting list are known-dead, since of
course there is only one LP_DEAD bit available for each posting list.

The hardest part of this kill_prior_tuple work was writing the new
_bt_killitems() code, which I'm still not 100% happy with. Still, it
seems to work well -- new pageinspect LP_DEAD status info was added to
the second patch to verify that we're setting LP_DEAD bits as needed
for posting list tuples. I also had to add a new nbtree-specific,
posting-list-aware version of index_compute_xid_horizon_for_tuples()
-- _bt_compute_xid_horizon_for_tuples(). Finally, it was necessary to
avoid splitting a posting list with the LP_DEAD bit set. I took a
naive approach to avoiding that problem, adding code to
_bt_findinsertloc() to prevent it. Posting list splits are generally
assumed to be rare, so the fact that this is slightly inefficient
should be fine IMV.

I also refactored deduplication itself in anticipation of making the
WAL logging more efficient, and incremental. So, the structure of the
code within _bt_dedup_one_page() was simplified, without really
changing it very much (I think). I also fixed a bug in
_bt_dedup_one_page(). The check for dead items was broken in previous
versions, because the loop examined the high key tuple in every
iteration.

Making _bt_dedup_one_page() more efficient and incremental is still
the most important open item for the patch.

--
Peter Geoghegan

Attachment Content-Type Size
v11-0001-Add-deduplication-to-nbtree.patch application/octet-stream 115.8 KB
v11-0002-DEBUG-Add-pageinspect-instrumentation.patch application/octet-stream 8.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-09-09 20:28:10 Re: [PATCH] kNN for btree
Previous Message Alexander Korotkov 2019-09-09 19:47:37 Re: Bug in GiST paring heap comparator