Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-07 06:54:48
Message-ID: CAH2-WzmP+QMea3eEa2UcYZDuyn9xb--JNR+qx-LF7rEwfRNLog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 6, 2019 at 10:15 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> After staring at the first patch for bit longer, a few things started to
> bother me:
>
> * The new struct is called BTScanInsert, but it's used for searches,
> too. It makes sense when you read the README, which explains the
> difference between "search scan keys" and "insertion scan keys", but now
> that we have a separate struct for this, perhaps we call insertion scan
> keys with a less confusing name. I don't know what to suggest, though.
> "Positioning key"?

I think that insertion scan key is fine. It's been called that for
almost twenty years. It's not like it's an intuitive concept that
could be conveyed easily if only we came up with a new, pithy name.

> * We store the binary search bounds in BTScanInsertData, but they're
> only used during insertions.
>
> * The binary search bounds are specific for a particular buffer. But
> that buffer is passed around separately from the bounds. It seems easy
> to have them go out of sync, so that you try to use the cached bounds
> for a different page. The savebinsrch and restorebinsrch is used to deal
> with that, but it is pretty complicated.

That might be an improvement, but I do think that using mutable state
in the insertion scankey, to restrict a search is an idea that could
work well in at least one other way. That really isn't a once-off
thing, even though it looks that way.

> I came up with the attached (against master), which addresses the 2nd
> and 3rd points. I added a whole new BTInsertStateData struct, to hold
> the binary search bounds. BTScanInsert now only holds the 'scankeys'
> array, and the 'nextkey' flag.

It will also have to store heapkeyspace, of course. And minusinfkey.
BTW, I would like to hear what you think of the idea of minusinfkey
(and the !minusinfkey optimization) specifically.

> The new BTInsertStateData struct also
> holds the current buffer we're considering to insert to, and a
> 'bounds_valid' flag to indicate if the saved bounds are valid for the
> current buffer. That way, it's more straightforward to clear the
> 'bounds_valid' flag whenever we move right.

I'm not sure that that's an improvement. Moving right should be very
rare with my patch. gcov shows that we never move right here anymore
with the regression tests, or within _bt_check_unique() -- not once.
For a second, I thought that you forgot to invalidate the bounds_valid
flag, because you didn't pass it directly, by value to
_bt_useduplicatepage().

> I made a copy of the _bt_binsrch, _bt_binsrch_insert. It does the binary
> search like _bt_binsrch does, but the bounds caching is only done in
> _bt_binsrch_insert. Seems more clear to have separate functions for them
> now, even though there's some duplication.

I'll have to think about that some more. Having a separate
_bt_binsrch_insert() may be worth it, but I'll need to do some
profiling.

> Hmm. Perhaps it would be to move the call to _bt_binsrch (or
> _bt_binsrch_insert with this patch) to outside _bt_findinsertloc. So
> that _bt_findinsertloc would only be responsible for finding the correct
> page to insert to. So the overall code, after patch #2, would be like:

Maybe, but as I said it's not like _bt_findinsertloc() doesn't know
all about unique indexes already. This is pointed out in a comment in
_bt_doinsert(), even. I guess that it might have to be changed to say
_bt_findinsertpage() instead, with your new approach.

> /*
> * Do the insertion. First move right to find the correct page to
> * insert to, if necessary. If we're inserting to a non-unique index,
> * _bt_search() already did this when it checked if a move to the
> * right was required for leaf page. Insertion scankey's scantid
> * would have been filled out at the time. On a unique index, the
> * current buffer is the first buffer containing duplicates, however,
> * so we may need to move right to the correct location for this
> * tuple.
> */
> if (checkingunique || itup_key->heapkeyspace)
> _bt_findinsertpage(rel, &insertstate, stack, heapRel);
>
> newitemoff = _bt_binsrch_insert(rel, &insertstate);
>
> _bt_insertonpg(rel, insertstate.buf, InvalidBuffer, stack, itup,
> newitemoff, false);
>
> Does this make sense?

I guess you're saying this because you noticed that the for (;;) loop
in _bt_findinsertloc() doesn't do that much in many cases, because of
the fastpath.

I suppose that this could be an improvement, provided all the
assertions that verify that the work "_bt_findinsertpage()" would have
done if called was in fact unnecessary. (e.g., check the high
key/rightmost-ness)

> The attached new version simplifies this, IMHO. The bounds and the
> current buffer go together in the same struct, so it's easier to keep
> track whether the bounds are valid or not.

I'll look into integrating this with my current draft v15 tomorrow.
Need to sleep on it.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-07 07:03:27 Re: Re: proposal: plpgsql pragma statement
Previous Message David Steele 2019-03-07 06:52:16 Re: Re: [HACKERS] proposal: schema variables