Quick Links

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc:	Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date:	2019-07-19 19:32:18
Message-ID:	CAH2-WzkejfjFohkYyaqGG8N7FXj+UV3_HjjsJqKiMi1MWfAtow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jul 19, 2019 at 10:53 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> Patch 0002 (must be applied on top of 0001) implements preserving of
> correct TID order
> inside posting list when inserting new tuples.
> This version passes all regression tests including amcheck test.
> I also used following script to test insertion into the posting list:

Nice!

> I suppose it is not the final version of the patch yet,
> so I left some debug messages and TODO comments to ease review.

I'm fine with leaving them in. I have sometimes distributed a separate
patch with debug messages, but now that I think about it, that
probably wasn't a good use of time.

You will probably want to remove at least some of the debug messages
during performance testing. I'm thinking of code that appears in very
tight inner loops, such as the _bt_compare() code.

> Please, in your review, pay particular attention to usage of
> BTreeTupleGetHeapTID.
> For posting tuples it returns the first tid from posting list like
> BTreeTupleGetMinTID,
> but maybe some callers are not ready for that and want
> BTreeTupleGetMaxTID instead.
> Incorrect usage of these macros may cause some subtle bugs,
> which are probably not covered by tests. So, please double-check it.

One testing strategy that I plan to use for the patch is to
deliberately corrupt a compressed index in a subtle way using
pg_hexedit, and then see if amcheck detects the problem. For example,
I may swap the order of two TIDs in the middle of a posting list,
which is something that is unlikely to produce wrong answers to
queries, and won't even be detected by the "heapallindexed" check, but
is still wrong. If we can detect very subtle, adversarial corruption
like this, then we can detect any real-world problem.

Once we have confidence in amcheck's ability to detect problems with
posting lists in general, we can use it in many different contexts
without much thought. For example, we'll probably need to do long
running benchmarks to validate the performance of the patch. It's easy
to add amcheck testing at the end of each run. Every benchmark is now
also a correctness/stress test, for free.

> Next week I'm going to check performance and try to find specific
> scenarios where this
> feature can lead to degradation and measure it, to understand if we need
> to make this deduplication optional.

Sounds good, though I think it might be a bit too early to decide
whether or not it needs to be enabled by default. For one thing, the
approach to WAL-logging within _bt_compress_one_page() is probably
fairly inefficient, which may be a problem for certain workloads. It's
okay to leave it that way for now, because it is not relevant to the
core design of the patch. I'm sure that _bt_compress_one_page() can be
carefully optimized when the time comes.

My current focus is not on the raw performance itself. For now, I am
focussed on making sure that the compression works well, and that the
resulting indexes "look nice" in general. FWIW, the first few versions
of my v12 work on nbtree didn't actually make *anything* go faster. It
took a couple of months to fix the more important regressions, and a
few more months to fix all of them. I think that the work on this
patch may develop in a similar way. I am willing to accept regressions
in the unoptimized code during development because it seems likely
that you have the right idea about the data structure itself, which is
the one thing that I *really* care about. Once you get that right, the
remaining problems are very likely to either be fixable with further
work on optimizing specific code, or a price that users will mostly be
happy to pay to get the benefits.

--
Peter Geoghegan

In response to

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2019-07-19 17:53:22 from Anastasia Lubennikova

Responses

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2019-07-20 02:24:48 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2019-07-19 19:52:27	Re: should there be a hard-limit on the number of transactions pending undo?
Previous Message	Andres Freund	2019-07-19 19:12:31	Re: should there be a hard-limit on the number of transactions pending undo?