Re: WIP: Covering + unique indexes.

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2018-03-24 19:39:20
Message-ID: CAPpHfdvrn6S_2ArObKtEErB3_Z9+rqKzzKQfJ6jryCoKoc=peA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 24, 2018 at 5:21 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Mar 22, 2018 at 8:23 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >> * There is minor formatting issue in this part of code. Some spaces
> need
> >> to be replaced with tabs.
> >> +IndexTuple
> >> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
> >> +{
> >> + TupleDesc itupdesc =
> >> CreateTupleDescCopyConstr(RelationGetDescr(idxrel));
> >> + Datum values[INDEX_MAX_KEYS];
> >> + bool isnull[INDEX_MAX_KEYS];
> >> + IndexTuple newitup;
>
> The last time I looked at this patch, in April 2017, I made the point
> that we should add something like an "nattributes" argument to
> index_truncate_tuple() [1], rather than always using
> IndexRelationGetNumberOfKeyAttributes() within index_truncate_tuple().
> I can see that that change hasn't been made since that time.
>

+1, putting "nattributes" to argument of index_truncate_tuple() would
make this function way more universal.

With that approach, we can avoid relying on catalog metadata to the
> same degree, which was a specific concern that Tom had around that
> time. It's easy to do something with t_tid's offset, which is unused
> in internal page IndexTuples. We do very similar things in GIN, where
> alternative use of an IndexTuple's t_tid supports all kinds of
> enhancements, some of which were not originally anticipated. Alexander
> surely knows more about this than I do, since he wrote that code.
>

Originally that code was written by Teodor, but I also put my hands there.
In GIN entry tree, item pointers are stored in a posting list which is
located
after index tuple attributes. So, both t_tid block number and offset are
used for GIN needs.

Having this index_truncate_tuple() "nattributes" argument, and storing
> the number of attributes directly improves quite a lot of things:
>
> * It makes diagnosing issues in the field quite a bit easier. Tools
> like pg_filedump can do the right thing (Tom mentioned pg_filedump and
> amcheck specifically). The nbtree IndexTuple format should not need to
> be interpreted in a context-sensitive way, if we can avoid it.
>
> * It lets you use index_truncate_tuple() for regular suffix truncation
> in the future. These INCLUDE IndexTuples are really just a special
> case of suffix truncation. At least, they should be, because otherwise
> an eventual suffix truncation feature is going to be incompatible with
> the INCLUDE tuple format. *Not* doing this makes suffix truncation
> harder. Suffix truncation is a classic technique, first described by
> Bayer in 1977, and we are very probably going to add it someday.
>
> * Once you can tell a truncated IndexTuple from a non-truncated one
> with little or no context, you can add defensive assertions in various
> places where they're helpful. Similarly, amcheck can use and expect
> this as a cross-check against IndexRelationGetNumberOfKeyAttributes().
> This will increase confidence in the design, both initially and over
> time.
>

That makes sense. Let's store the number of tuple attributes to t_tid.
Assuming that our INDEX_MAX_KEYS is quite small number, we will have
higher bits of t_tid free for latter use.

I must say that I am disappointed that nothing has happened here,
> especially because this really wasn't very much additional work, and
> has essentially no downside. I can see that it doesn't work that way
> in the Postgres Pro fork [2], and diverging from that may
> inconvenience Postgres Pro, but that's a downside of forking. I don't
> think that the community should have to absorb that cost.
>

Sure, community shouldn't take care about Postgres Pro fork. If we find
that something is better to be done differently, than let us do it so.

> +Notes to Operator Class Implementors
> > +------------------------------------
> >
> > Besides I really appreciate it, it seems to be unrelated to the covering
> > indexes.
> > I'd like this to be extracted into separate patch and be committed
> > separately.
>
> Commit 3785f7ee, from last month, moved the original "Notes to
> Operator Class Implementors" section to the SGML docs. It looks like
> that README section was accidentally reintroduced during rebasing. The
> new information ("Included attributes in B-tree indexes") should be
> moved over to the new section of the user docs -- the section that
> 3785f7ee added.
>

Thank you for noticing that. I've overlooked that.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-24 20:33:21 Re: Why does load_external_function() return PGFunction?
Previous Message Tom Lane 2018-03-24 19:17:54 Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently