Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Andres Freund <andres(at)anarazel(dot)de>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-08-10 05:28:30
Message-ID: 20220810052830.g7vf2jktxix22wmt@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > The relevant code triggering it:
> >
> > newbuf = XLogInitBufferForRedo(record, 1);
> > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > xlrec->new_bucket_flag, true);
> > if (!IsBufferCleanupOK(newbuf))
> > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was
> written, the new_bucket field referred to an unused page, and so during
> replay it should also refer to an unused page, and being unused, that nobody
> will have it pinned. But at least in heap we sometimes pin unused pages
> just long enough to examine them and to see that they are unused. Maybe
> something like that is happening here?

I don't think it's a safe assumption that nobody would hold a pin on such a
page during recovery. While not the case here, somebody else could have used
pg_prewarm to read it in.

But also, the checkpointer or bgwriter could have it temporarily pinned, to
write it out, or another backend could try to write it out as a victim buffer
and have it temporarily pinned.

static int
SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
{
...
/*
* Pin it, share-lock it, write it. (FlushBuffer will do nothing if the
* buffer is clean by the time we've locked it.)
*/
PinBuffer_Locked(bufHdr);
LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);

As you can see we acquire a pin without holding a lock on the page (and that
can't be changed!).

I assume this is trying to defend against some sort of deadlock by not
actually getting a cleanup lock (by passing get_cleanup_lock = true to
XLogReadBufferForRedoExtended()).

I don't think it's possible to rely on a dirty page to never be pinned by
another backend. All you can rely on with a cleanup lock is that there's no
*prior* references to the buffer, and thus it's safe to reorganize the buffer,
because the pin-holder hasn't yet gotten a lock on the page.

> I'd be curious to see the count returned by
> BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic. If it's
> just 1, then it's not another backend, but our own, and we'd want to debug
> why we're pinning the same page twice (or more) while replaying wal.

This was the first time in a couple hundred runs on that I have seen this, so
I don't think it's that easily debuggable for me.

> Otherwise, maybe it's a race condition with some other process that
> transiently pins a buffer and occasionally causes this code to panic?

As pointed out above, it's legal to have a transient pin on a page, so this
just looks like a bad assumption in the hash code to me.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-08-10 05:35:09 Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Previous Message Simon Riggs 2022-08-10 05:09:38 Blocking the use of TRIGGER privilege