From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date: | 2022-08-11 16:36:38 |
Message-ID: | CALDaNm2Avar8KWTx-DxxOGW1pE9VtYrYMcue7gQPj5pJ-5ttbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 10, 2022 at 2:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Aug 10, 2022 at 10:58 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > > > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > The relevant code triggering it:
> > > >
> > > > newbuf = XLogInitBufferForRedo(record, 1);
> > > > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > > > xlrec->new_bucket_flag, true);
> > > > if (!IsBufferCleanupOK(newbuf))
> > > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> > > >
> > > > Why do we just crash if we don't already have a cleanup lock? That can't be
> > > > right. Or is there supposed to be a guarantee this can't happen?
> > >
> > > Perhaps the code assumes that when xl_hash_split_allocate_page record was
> > > written, the new_bucket field referred to an unused page, and so during
> > > replay it should also refer to an unused page, and being unused, that nobody
> > > will have it pinned. But at least in heap we sometimes pin unused pages
> > > just long enough to examine them and to see that they are unused. Maybe
> > > something like that is happening here?
> >
> > I don't think it's a safe assumption that nobody would hold a pin on such a
> > page during recovery. While not the case here, somebody else could have used
> > pg_prewarm to read it in.
> >
> > But also, the checkpointer or bgwriter could have it temporarily pinned, to
> > write it out, or another backend could try to write it out as a victim buffer
> > and have it temporarily pinned.
> >
> >
> > static int
> > SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
> > {
> > ...
> > /*
> > * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the
> > * buffer is clean by the time we've locked it.)
> > */
> > PinBuffer_Locked(bufHdr);
> > LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
> >
> >
> > As you can see we acquire a pin without holding a lock on the page (and that
> > can't be changed!).
> >
>
> I think this could be the probable reason for failure though I didn't
> try to debug/reproduce this yet. AFAIU, this is possible during
> recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via
> XLogReadBufferForRedoExtended, we can mark the buffer dirty while
> restoring from full page image. OTOH, because during normal operation
> we didn't mark the page dirty SyncOneBuffer would have skipped it due
> to check (if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))).
I'm trying to simulate the scenario in streaming replication using the below:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
With the above scenario, it will be able to replay allocation of page
for split operation. I will slightly change the above statements and
try to debug and see if we can make the background writer process to
pin this buffer and simulate the scenario. I will post my findings
once I'm done with the analysis.
Regards,
Vignesh
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2022-08-11 16:59:33 | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
Previous Message | Bharath Rupireddy | 2022-08-11 16:36:24 | Re: Assertion failure in WaitForWALToBecomeAvailable state machine |