Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-08-10 04:38:45
Message-ID: CA+hUKGLbS0O85J=fMMze2BCNRQi5VB376-GiJnqK__U0=A02eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2022 at 3:21 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The relevant code triggering it:
> >
> > newbuf = XLogInitBufferForRedo(record, 1);
> > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > xlrec->new_bucket_flag, true);
> > if (!IsBufferCleanupOK(newbuf))
> > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was written, the new_bucket field referred to an unused page, and so during replay it should also refer to an unused page, and being unused, that nobody will have it pinned. But at least in heap we sometimes pin unused pages just long enough to examine them and to see that they are unused. Maybe something like that is happening here?

Here's an email about that:

https://www.postgresql.org/message-id/CAE9k0P=OXww6RQCGrmDNa8=L3EeB01SGbYuP23y-qZJ=4td38Q@mail.gmail.com

> I'd be curious to see the count returned by BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic. If it's just 1, then it's not another backend, but our own, and we'd want to debug why we're pinning the same page twice (or more) while replaying wal. Otherwise, maybe it's a race condition with some other process that transiently pins a buffer and occasionally causes this code to panic?

But which backend could that be? We aren't starting any at that point
in the test.

Someone might wonder if it's the startup process itself via the new
WAL prefetching machinery, but that doesn't pin pages, it only probes
the buffer mapping table to see if future pages are cached already
(see bufmgr.c PrefetchSharedBuffer()). (This is a topic I've thought
about a bit because I have another installment of recovery prefetching
in development using real AIO that *does* pin pages in advance, and
has to deal with code that wants cleanup locks like this...)

It's possible that git log src/backend/access/hash/ can explain a
behaviour change, as there were some recent changes there, but it's
not jumping out at me. Maybe 4f1f5a7f "Remove fls(), use
pg_leftmost_one_pos32() instead." has a maths error, but I don't see
it. Maybe e09d7a12 "Improve speed of hash index build." accidentally
reaches a new state and triggers a latent bug. Maybe a latent bug
showed up now just because we started testing recovery not too long
ago... but all of that still needs another backend involved. We can
see which blocks the startup process has pinned, 23 != 45. Hmmm.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-08-10 05:01:27 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Zhihong Yu 2022-08-10 04:26:28 Re: dropping datumSort field