Re: nbtree: assertion failure in _bt_killitems() for posting tuple

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: nbtree: assertion failure in _bt_killitems() for posting tuple
Date: 2020-03-24 08:00:17
Message-ID: 03da9588-75dc-3527-d6d0-0548b60cb251@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.03.2020 03:34, Peter Geoghegan wrote:
> On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>> During tests, we catched an assertion failure in _bt_killitems() for
>> posting tuple in unique index:
>>
>> /* kitem must have matching offnum when heap TIDs match */
>> Assert(kitem->indexOffset == offnum);
>>
>> https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809
>>
>> I struggle to understand the meaning of this assertion.
>> Don't we allow the chance that posting tuple moved right on the page as
>> the comment says?
> I think you're right. However, it still seems like we should check
> that "kitem->indexOffset" is consistent among all of the BTScanPosItem
> entries that we have for each TID that we believe to be from the same
> posting list tuple.
What kind of consistency do you mean here?
We can probably change this assertion to
    Assert(kitem->indexOffset <= offnum);

Anything else?
> (Thinks some more...)
>
> Even if the offnum changes when the buffer lock is released, due to
> somebody inserting on to the same page, I guess that we still expect
> to observe all of the heap TIDs together in the posting list. Though
> maybe not. Maybe it's possible for a deduplication pass to occur when
> the buffer lock is dropped, in which case we should arguably behave in
> the same way when we see the same heap TIDs (i.e. delete the entire
> posting list without regard for whether or not the TIDs happened to
> appear in a posting list initially). I'm not sure, though.
>
> It will make no difference most of the time, since the
> kill_prior_tuple stuff is generally not applied when the page is
> changed at all -- the LSN is checked by the logic added by commit
> 2ed5b87f. That's why I asked about unlogged indexes (we don't do the
> LSN thing there). But I still think that we need to take a firm
> position on it.
>
It was a logged index. Though the failed test setup includes logical
replication. Does it handle LSNs differently?
Finally, Alexander Lakhin managed to reproduce this on master. Test is
attached as a patch.

Speaking of unlogged indexes. Now the situation, where items moved left
on the page is legal even if LSN haven't changed.
Anyway, the cycle starts  from the offset that we saved in a first pass:

      OffsetNumber offnum = kitem->indexOffset;
      while (offnum <= maxoff)
        ...

It still works correctly, but probably microvacuum becomes less
efficient, if items were concurrently deduplicated.
I wonder if this case worth optimizing?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
000_rep_btree_test.patch text/x-patch 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-24 08:04:30 Re: Some problems of recovery conflict wait events
Previous Message Richard Guo 2020-03-24 07:36:34 Re: weird hash plan cost, starting with pg10