Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-07 07:58:58
Message-ID: d66ba339-86eb-fb95-a123-92aec00816a7@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/03/2019 15:41, Heikki Linnakangas wrote:
> On 07/03/2019 14:54, Peter Geoghegan wrote:
>> On Wed, Mar 6, 2019 at 10:15 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> After staring at the first patch for bit longer, a few things started to
>>> bother me:
>>>
>>> * The new struct is called BTScanInsert, but it's used for searches,
>>> too. It makes sense when you read the README, which explains the
>>> difference between "search scan keys" and "insertion scan keys", but now
>>> that we have a separate struct for this, perhaps we call insertion scan
>>> keys with a less confusing name. I don't know what to suggest, though.
>>> "Positioning key"?
>>
>> I think that insertion scan key is fine. It's been called that for
>> almost twenty years. It's not like it's an intuitive concept that
>> could be conveyed easily if only we came up with a new, pithy name.
>
> Yeah. It's been like that forever, but I must confess I hadn't paid any
> attention to it, until now. I had not understood that text in the README
> explaining the difference between search and insertion scan keys, before
> looking at this patch. Not sure I ever read it with any thought. Now
> that I understand it, I don't like the "insertion scan key" name.
>
>> BTW, I would like to hear what you think of the idea of minusinfkey
>> (and the !minusinfkey optimization) specifically.
>
> I don't understand it :-(. I guess that's valuable feedback on its own.
> I'll spend more time reading the code around that, but meanwhile, if you
> can think of a simpler way to explain it in the comments, that'd be good.
>
>>> The new BTInsertStateData struct also
>>> holds the current buffer we're considering to insert to, and a
>>> 'bounds_valid' flag to indicate if the saved bounds are valid for the
>>> current buffer. That way, it's more straightforward to clear the
>>> 'bounds_valid' flag whenever we move right.
>>
>> I'm not sure that that's an improvement. Moving right should be very
>> rare with my patch. gcov shows that we never move right here anymore
>> with the regression tests, or within _bt_check_unique() -- not once.
>
> I haven't given performance much thought, really. I don't expect this
> method to be any slower, but the point of the refactoring is to make the
> code easier to understand.
>
>>> /*
>>> * Do the insertion. First move right to find the correct page to
>>> * insert to, if necessary. If we're inserting to a non-unique index,
>>> * _bt_search() already did this when it checked if a move to the
>>> * right was required for leaf page. Insertion scankey's scantid
>>> * would have been filled out at the time. On a unique index, the
>>> * current buffer is the first buffer containing duplicates, however,
>>> * so we may need to move right to the correct location for this
>>> * tuple.
>>> */
>>> if (checkingunique || itup_key->heapkeyspace)
>>> _bt_findinsertpage(rel, &insertstate, stack, heapRel);
>>>
>>> newitemoff = _bt_binsrch_insert(rel, &insertstate);
>>>
>>> _bt_insertonpg(rel, insertstate.buf, InvalidBuffer, stack, itup,
>>> newitemoff, false);
>>>
>>> Does this make sense?
>>
>> I guess you're saying this because you noticed that the for (;;) loop
>> in _bt_findinsertloc() doesn't do that much in many cases, because of
>> the fastpath.
>
> The idea is that _bt_findinsertpage() would not need to know whether the
> unique checks were performed or not. I'd like to encapsulate all the
> information about the "insert position we're considering" in the
> BTInsertStateData struct. Passing 'checkingunique' as a separate
> argument violates that, because when it's set, the key means something
> slightly different.
>
> Hmm. Actually, with patch #2, _bt_findinsertloc() could look at whether
> 'scantid' is set, instead of 'checkingunique'. That would seem better.
> If it looks like 'checkingunique', it's making the assumption that if
> unique checks were not performed, then we are already positioned on the
> correct page, according to the heap TID. But looking at 'scantid' seems
> like a more direct way of getting the same information. And then we
> won't need to pass the 'checkingunique' flag as an "out-of-band" argument.
>
> So I'm specifically suggesting that we replace this, in _bt_findinsertloc:
>
> if (!checkingunique && itup_key->heapkeyspace)
> break;
>
> With this:
>
> if (itup_key->scantid)
> break;
>
> And remove the 'checkingunique' argument from _bt_findinsertloc.

Ah, scratch that. By the time we call _bt_findinsertloc(), scantid has
already been restored, even if it was not set originally when we did
_bt_search.

My dislike here is that passing 'checkingunique' as a separate argument
acts like a "modifier", changing slightly the meaning of the insertion
scan key. If it's not set, we know we're positioned on the correct page.
Otherwise, we might not be. And it's a pretty indirect way of saying
that, as it also depends 'heapkeyspace'. Perhaps add a flag to
BTInsertStateData, to indicate the same thing more explicitly. Something
like "bool is_final_insertion_page; /* when set, no need to move right */".

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-07 08:10:47 Re: Re: [HACKERS] proposal: schema variables
Previous Message Sergei Kornilov 2019-03-07 07:57:46 Re: pg_basebackup against older server versions