Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-08-11 16:36:38
Message-ID: CALDaNm2Avar8KWTx-DxxOGW1pE9VtYrYMcue7gQPj5pJ-5ttbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2022 at 2:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > The relevant code triggering it:
> > > >
> > > > newbuf = XLogInitBufferForRedo(record, 1);
> > > > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > > > xlrec->new_bucket_flag, true);
> > > > if (!IsBufferCleanupOK(newbuf))
> > > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> > > >
> > > > Why do we just crash if we don't already have a cleanup lock? That can't be
> > > > right. Or is there supposed to be a guarantee this can't happen?
> > >
> > > Perhaps the code assumes that when xl_hash_split_allocate_page record was
> > > written, the new_bucket field referred to an unused page, and so during
> > > replay it should also refer to an unused page, and being unused, that nobody
> > > will have it pinned. But at least in heap we sometimes pin unused pages
> > > just long enough to examine them and to see that they are unused. Maybe
> > > something like that is happening here?
> >
> > I don't think it's a safe assumption that nobody would hold a pin on such a
> > page during recovery. While not the case here, somebody else could have used
> > pg_prewarm to read it in.
> >
> > But also, the checkpointer or bgwriter could have it temporarily pinned, to
> > write it out, or another backend could try to write it out as a victim buffer
> > and have it temporarily pinned.
> >
> >
> > static int
> > SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
> > {
> > ...
> > /*
> > * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the
> > * buffer is clean by the time we've locked it.)
> > */
> > PinBuffer_Locked(bufHdr);
> > LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
> >
> >
> > As you can see we acquire a pin without holding a lock on the page (and that
> > can't be changed!).
> >
>
> I think this could be the probable reason for failure though I didn't
> try to debug/reproduce this yet. AFAIU, this is possible during
> recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
> XLogReadBufferForRedoExtended, we can mark the buffer dirty while
> restoring from full page image. OTOH, because during normal operation
> we didn't mark the page dirty SyncOneBuffer would have skipped it due
> to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).

I'm trying to simulate the scenario in streaming replication using the below:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;

With the above scenario, it will be able to replay allocation of page
for split operation. I will slightly change the above statements and
try to debug and see if we can make the background writer process to
pin this buffer and simulate the scenario. I will post my findings
once I'm done with the analysis.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-08-11 16:59:33 Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Previous Message Bharath Rupireddy 2022-08-11 16:36:24 Re: Assertion failure in WaitForWALToBecomeAvailable state machine