Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-10 18:53:08
Message-ID: CAH2-WznoF2kjn2KBOvDzWZRDfA_2pBFoQap3b6qv7fpwpmyN5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 10, 2019 at 7:09 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> "descendrighttrunc" doesn't make much sense to me, either. I don't
> understand it. Maybe a comment would make it clear, though.

It's not an easily grasped concept. I don't think that any name will
easily convey the idea to the reader, though. I'm happy to go with
whatever name you prefer.

> I don't feel like this is an optimization. It's natural consequence of
> what the high key means. I guess you can think of it as an optimization,
> in the same way that not fully scanning the whole index for every search
> is an optimization, but that's not how I think of it :-).

I would agree with this in a green field situation, where we don't
have to consider the legacy of v3 indexes. But that's not the case
here.

> If we don't flip the meaning of the flag, then maybe calling it
> something like "searching_for_leaf_page" would make sense:
>
> /*
> * When set, we're searching for the leaf page with the given high key,
> * rather than leaf tuples matching the search keys.
> *
> * Normally, when !searching_for_pivot_tuple, if a page's high key

I guess you meant to say "searching_for_pivot_tuple" both times (not
"searching_for_leaf_page"). After all, we always search for a leaf
page. :-)

I'm fine with "searching_for_pivot_tuple", I think. The underscores
are not really stylistically consistent with other stuff in nbtree.h,
but I can use something very similar to your suggestion that is
consistent.

> * has truncated columns, and the columns that are present are equal to
> * the search key, the search will not descend to that page. For
> * example, if an index has two columns, and a page's high key is
> * ("foo", <omitted>), and the search key is also ("foo," <omitted>),
> * the search will not descend to that page, but its right sibling. The
> * omitted column in the high key means that all tuples on the page must
> * be strictly < "foo", so we don't need to visit it. However, sometimes
> * we perform a search to find the parent of a leaf page, using the leaf
> * page's high key as the search key. In that case, when we search for
> * ("foo", <omitted>), we do want to land on that page, not its right
> * sibling.
> */
> bool searching_for_leaf_page;

That works for me (assuming you meant searching_for_pivot_tuple).

> As the patch stands, you're also setting minusinfkey when dealing with
> v3 indexes. I think it would be better to only set
> searching_for_leaf_page in nbtpage.c.

That would mean I would have to check both heapkeyspace and
minusinfkey within _bt_compare(). I would rather just keep the
assertion that makes sure that !heapkeyspace callers are also
minusinfkey callers, and the comments that explain why that is. It
might even matter to performance -- having an extra condition in
_bt_compare() is something we should avoid.

> In general, I think BTScanInsert
> should describe the search key, regardless of whether it's a V3 or V4
> index. Properties of the index belong elsewhere. (We're violating that
> by storing the 'heapkeyspace' flag in BTScanInsert. That wart is
> probably OK, it is pretty convenient to have it there. But in principle...)

The idea with pg_upgrade'd v3 indexes is, as I said a while back, that
they too have a heap TID attribute. nbtsearch.c code is not allowed to
rely on its value, though, and must use
minusinfkey/searching_for_pivot_tuple semantics (relying on its value
being minus infinity is still relying on its value being something).

Now, it's also true that there are a number of things that we have to
do within nbtinsert.c for !heapkeyspace that don't really concern the
key space as such. Even still, thinking about everything with
reference to the keyspace, and keeping that as similar as possible
between v3 and v4 is a good thing. It is up to high level code (such
as _bt_first()) to not allow a !heapkeyspace index scan to do
something that won't work for it. It is not up to low level code like
_bt_compare() to worry about these differences (beyond asserting that
caller got it right). If page deletion didn't need minusinfkey
semantics (if nobody but v3 indexes needed that), then I would just
make the "move right of separator" !minusinfkey code within
_bt_compare() test heapkeyspace. But we do have a general need for
minusinfkey semantics, so it seems simpler and more future-proof to
keep heapkeyspace out of low-level nbtsearch.c code.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-03-10 18:58:07 Re: Batch insert in CTAS/MatView code
Previous Message Andrew Dunstan 2019-03-10 18:40:15 Re: subscriptionCheck failures on nightjar