Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-09-16 21:11:29
Message-ID: CAH2-Wzkj9AgE1YJ=dXKajwiOM7_AqWpDs=9bDwowoQ=ycFLPRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2019 at 11:58 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I think that the problem here is that you didn't copy this old code
> from _bt_split() over to _bt_dedup_one_page():
>
> /*
> * Copy the original page's LSN into leftpage, which will become the
> * updated version of the page. We need this because XLogInsert will
> * examine the LSN and possibly dump it in a page image.
> */
> PageSetLSN(leftpage, PageGetLSN(origpage));
> isleaf = P_ISLEAF(oopaque);

I can confirm that this is what the problem was. Attached are two patches:

* A version of your v14 from today with a couple of tiny changes to
make it work against the current master branch -- I had to rebase the
patch, but the changes made while rebasing were totally trivial. (I
like to keep CFTester green.)

* The second patch actually fixes the PageSetLSN() thing, setting the
temp page buffer's LSN to match the original page before any real work
is done, and before XLogInsert() is called. Just like _bt_split().

The test case now shows exactly what you reported for "FPWs off" when
FPWs are turned on, at least on my machine and with my checkpoint
settings. That is, there are *zero* FPIs/FPWs, so the final nbtree
volume is 2128 MB. This means that the volume of additional WAL
required over what the master branch requires for the same test case
is very small (2128 MB compares well with master's 2011 MB of WAL).
Maybe we could do better than 2128 MB with more work, but this is
definitely already low enough overhead to be acceptable. This also
passes "make check-world" testing.

However, my usual wal_consistency_checking smoke test fails pretty
quickly with the two patches applied:

3634/2019-09-16 13:53:22 PDT FATAL: inconsistent page found, rel
1663/16385/2673, forknum 0, blkno 13
3634/2019-09-16 13:53:22 PDT CONTEXT: WAL redo at 0/3202370 for
Btree/DEDUPLICATE: items were deduplicated to 12 items
3633/2019-09-16 13:53:22 PDT LOG: startup process (PID 3634) exited
with exit code 1

Maybe the lack of the PageSetLSN() thing masked a bug in WAL replay,
since without that we effectively always just replay FPIs, never truly
relying on redo. (I didn't try wal_consistency_checking without the
second patch, but I assume that you did, and found no problems for
this reason.)

Can you produce a new version that integrates the PageSetLSN() thing,
and fixes this bug?

Thanks
--
Peter Geoghegan

Attachment Content-Type Size
v141-0002-Add-_bt_split-style-WAL-optimization.patch application/octet-stream 1.1 KB
v141-0001-v14-0001-Add-deduplication-to-nbtree.patch-from.patch application/octet-stream 127.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-09-16 21:22:51 Re: [proposal] de-TOAST'ing using a iterator
Previous Message Tom Lane 2019-09-16 21:10:25 Re: Define jsonpath functions as stable