From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date: | 2022-08-10 05:28:30 |
Message-ID: | 20220810052830.g7vf2jktxix22wmt@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-08-09 20:21:19 -0700, Mark Dilger wrote:
> > On Aug 9, 2022, at 7:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > The relevant code triggering it:
> >
> > newbuf = XLogInitBufferForRedo(record, 1);
> > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
> > xlrec->new_bucket_flag, true);
> > if (!IsBufferCleanupOK(newbuf))
> > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
> >
> > Why do we just crash if we don't already have a cleanup lock? That can't be
> > right. Or is there supposed to be a guarantee this can't happen?
>
> Perhaps the code assumes that when xl_hash_split_allocate_page record was
> written, the new_bucket field referred to an unused page, and so during
> replay it should also refer to an unused page, and being unused, that nobody
> will have it pinned. But at least in heap we sometimes pin unused pages
> just long enough to examine them and to see that they are unused. Maybe
> something like that is happening here?
I don't think it's a safe assumption that nobody would hold a pin on such a
page during recovery. While not the case here, somebody else could have used
pg_prewarm to read it in.
But also, the checkpointer or bgwriter could have it temporarily pinned, to
write it out, or another backend could try to write it out as a victim buffer
and have it temporarily pinned.
static int
SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
{
...
/*
* Pin it, share-lock it, write it. (FlushBuffer will do nothing if the
* buffer is clean by the time we've locked it.)
*/
PinBuffer_Locked(bufHdr);
LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
As you can see we acquire a pin without holding a lock on the page (and that
can't be changed!).
I assume this is trying to defend against some sort of deadlock by not
actually getting a cleanup lock (by passing get_cleanup_lock = true to
XLogReadBufferForRedoExtended()).
I don't think it's possible to rely on a dirty page to never be pinned by
another backend. All you can rely on with a cleanup lock is that there's no
*prior* references to the buffer, and thus it's safe to reorganize the buffer,
because the pin-holder hasn't yet gotten a lock on the page.
> I'd be curious to see the count returned by
> BUF_STATE_GET_REFCOUNT(LockBufHdr(newbuf)) right before this panic. If it's
> just 1, then it's not another backend, but our own, and we'd want to debug
> why we're pinning the same page twice (or more) while replaying wal.
This was the first time in a couple hundred runs on that I have seen this, so
I don't think it's that easily debuggable for me.
> Otherwise, maybe it's a race condition with some other process that
> transiently pins a buffer and occasionally causes this code to panic?
As pointed out above, it's legal to have a transient pin on a page, so this
just looks like a bad assumption in the hash code to me.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2022-08-10 05:35:09 | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Previous Message | Simon Riggs | 2022-08-10 05:09:38 | Blocking the use of TRIGGER privilege |