Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masahiko(dot)sawada(at)2ndquadrant(dot)com
Cc: pg(at)bowt(dot)ie, pgsql-hackers(at)postgresql(dot)org, alvherre(at)2ndquadrant(dot)com
Subject: Re: Multiple FPI_FOR_HINT for the same block during killing btree index items
Date: 2020-04-10 04:30:13
Message-ID: 20200410.133013.920903448679757354.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 10 Apr 2020 12:32:31 +0900, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote in
> On Fri, 10 Apr 2020 at 04:05, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > Here is the reproducer:
> >
> > What version of Postgres did you notice the actual customer issue on?
> > I ask because I wonder if the work on B-Tree indexes in Postgres 12
> > affects the precise behavior you get here with real world workloads.
> > It probably makes _bt_killitems() more effective with some workloads,
> > which naturally increases the likelihood of having multiple FPI issued
> > in the manner that you describe. OTOH, it might make it less likely
> > with low cardinality indexes, since large groups of garbage duplicate
> > tuples tend to get concentrated on just a few leaf pages.
> >
> > > The inner test in the comment "found the item" never tests the item
> > > for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> > > condition. But there still is a race condition of recording multiple
> > > FPIs can happen. Maybe a better solution is to change the lock to
> > > exclusive, at least when wal_log_hints = on, so that only one process
> > > can run this code -- the reduction in concurrency might be won back by
> > > the fact that we don't wal-log the page multiple times.
> >
> > I like the idea of checking !ItemIdIsDead(iid) as a further condition
> > of killing the item -- there is clearly no point in doing work to kill
> > an item that is already dead. I don't like the idea of using an
> > exclusive buffer lock (even if it's just with wal_log_hints = on),
> > though.
> >
>
> Okay. I think only adding the check would also help with reducing the
> likelihood. How about the changes for the current HEAD I've attached?

FWIW, looks good to me.

> Related to this behavior on btree indexes, this can happen even on
> heaps during searching heap tuples. To reduce the likelihood of that
> more generally I wonder if we can acquire a lock on buffer descriptor
> right before XLogSaveBufferForHint() and set a flag to the buffer
> descriptor that indicates that we're about to log FPI for hint bit so
> that concurrent process can be aware of that.

Makes sense if the lock were acquired just before the "BM_DIRTY |
BM_JUST_DIRTIED) check. Could we use double-checking, as similar to
the patch for ItemIdIsDead()?

> if ((pg_atomic_read_u32(&bufHdr->state) & (BM_DIRTY | BM_JUST_DIRTIED)) !=
> (BM_DIRTY | BM_JUST_DIRTIED))
> {
...
> * essential that CreateCheckpoint waits for virtual transactions
> * rather than full transactionids.
> */
> /* blah, blah */
> buf_state = LockBufHdr(bufHdr);
>
> if (buf_state & (BM_ | BM_JUST) != (..))
> {
> MyProc->delayChkpt = delayChkpt = true;
> lsn = XLogSaveBufferForHint(buffer, buffer_std);
> }
> }
> else
> buf_state = LockBuffer(bufHdr);

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-04-10 05:04:02 Re: Vacuum o/p with (full 1, parallel 0) option throwing an error
Previous Message Fujii Masao 2020-04-10 04:20:48 Re: SyncRepLock acquired exclusively in default configuration