Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-14 11:00:25
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13/03/2019 03:28, Peter Geoghegan wrote:
> It would be great if you could take a look at the 'Add high key
> "continuescan" optimization' patch, which is the only one you haven't
> commented on so far (excluding the amcheck "relocate" patch, which is
> less important). I can put that one off for a while after the first 3
> go in. I will also put off the "split after new item" commit for at
> least a week or two. I'm sure that the idea behind the "continuescan"
> patch will now seem pretty obvious to you -- it's just taking
> advantage of the high key when an index scan on the leaf level (which
> uses a search style scankey, not an insertion style scankey) looks
> like it may have to move to the next leaf page, but we'd like to avoid
> it where possible. Checking the high key there is now much more likely
> to result in the index scan not going to the next page, since we're
> more careful when considering a leaf split point these days. The high
> key often looks like the items on the page to the right, not the items
> on the same page.

Oh yeah, that makes perfect sense. I wonder why we haven't done it like
that before? The new page split logic makes it more likely to help, but
even without that, I don't see any downside.

I find it a bit confusing, that the logic is now split between
_bt_checkkeys() and _bt_readpage(). For a forward scan, _bt_readpage()
does the high-key check, but the corresponding "first-key" check in a
backward scan is done in _bt_checkkeys(). I'd suggest moving the logic
completely to _bt_readpage(), so that it's in one place. With that,
_bt_checkkeys() can always check the keys as it's told, without looking
at the LP_DEAD flag. Like the attached.

- Heikki

Attachment Content-Type Size
v16-heikki-0001-Add-high-key-continuescan-optimization.patch text/x-patch 12.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message MikalaiKeida 2019-03-14 11:01:06 RE: Timeout parameters
Previous Message Kyotaro HORIGUCHI 2019-03-14 10:55:53 Re: Is PREPARE of ecpglib thread safe?