Re: Pluggable storage

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable storage
Date: 2017-07-19 18:53:42
Message-ID: CAH2-Wzn=6Lc3OVA88x=E6SKG72ojNUE6ut6RZAqNnQx-YLcw=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 19, 2017 at 10:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I strongly agree. I simply don't understand how you can adopt UNDO for
>> MVCC, and yet expect to get a benefit commensurate with the effort
>> without also implementing "retail index tuple deletion" first.
>
> I agree that we need retail index tuple deletion. I liked Claudio's
> idea at http://postgr.es/m/CAGTBQpZ-kTRQiAa13xG1GNe461YOwrA-s-ycCQPtyFrpKTaDBQ@mail.gmail.com
> -- that seems indispensible to making retail index tuple deletion
> reasonably efficient. Is anybody going to work on getting that
> committed?

I will do review work on it.

IMV the main problems are:

* The way a "header" is added at the PageAddItemExtended() level,
rather than making heap TID something much closer to a conventional
attribute that perhaps only nbtree and indextuple.c have special
knowledge of, strikes me as the wrong way to go.

* It's simply not acceptable to add overhead to *all* internal items.
That kills fan-in. We're going to need suffix truncation for the
common case where the user-visible attributes for a split point/new
high key at the leaf level sufficiently distinguish what belongs on
either side. IOW, you should only see internal items with a heap TID
in the uncommon case where you have so many duplicates at the leaf
level that you have no choice put to use a split point that's right in
the middle of many duplicates.

Fortunately, if we confine ourselves to making heap TID part of the
keyspace, the code can be far simpler than what would be needed to get
my preferred, all-encompassing design for suffix truncation [1] to
work. I think we could just stash the number of attributes
participating in a comparison within internal pages' unused item
pointer offset. I've talked about this before, in the context of
Anastasia's INCLUDED columns patch. If we can have a variable number
of attributes for heap tuples, we can do so for index tuples, too.

* We might also have problems with changing the performance
characteristics for the worse in some cases by some measures. This
will probably technically increase the amount of bloat for some
indexes with sparse deletion patterns. I think that that will be well
worth it, but I don't expect a slam dunk.

A nice benefit of this work is that it lets us kill the hack that adds
randomness to the search for free space among duplicates, and may let
us follow the Lehman & Yao algorithm more closely.

[1] https://wiki.postgresql.org/wiki/Key_normalization#Suffix_truncation_of_normalized_keys
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2017-07-19 18:54:15 Re: Patch: Add --no-comments to skip COMMENTs with pg_dump
Previous Message Tom Lane 2017-07-19 18:36:21 Re: Something for the TODO list: deprecating abstime and friends