Re: Deadlock in XLogInsert at AIX

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deadlock in XLogInsert at AIX
Date: 2017-01-30 16:21:59
Message-ID: 0184657d-cf64-10f6-9706-7329f23f5f89@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote:
> As I already mentioned, we built Postgres with LOCK_DEBUG , so we can
> inspect lock owner. Backend is waiting for itself!
> Now please look at two frames in this stack trace marked with red.
> XLogInsertRecord is setting WALInsert locks at the beginning of the
> function:
>
> if (isLogSwitch)
> WALInsertLockAcquireExclusive();
> else
> WALInsertLockAcquire();
>
> WALInsertLockAcquire just selects random item from WALInsertLocks array
> and exclusively locks:
>
> if (lockToTry == -1)
> lockToTry = MyProc->pgprocno % NUM_XLOGINSERT_LOCKS;
> MyLockNo = lockToTry;
> immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
>
> Then, following the stack trace, AdvanceXLInsertBuffer calls
> WaitXLogInsertionsToFinish:
>
> /*
> * Now that we have an up-to-date LogwrtResult value, see if we
> * still need to write it or if someone else already did.
> */
> if (LogwrtResult.Write < OldPageRqstPtr)
> {
> /*
> * Must acquire write lock. Release WALBufMappingLock
> first,
> * to make sure that all insertions that we need to
> wait for
> * can finish (up to this same position). Otherwise we risk
> * deadlock.
> */
> LWLockRelease(WALBufMappingLock);
>
> WaitXLogInsertionsToFinish(OldPageRqstPtr);
>
> LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
>
>
> It releases WALBufMappingLock but not WAL insert locks!
> Finally in WaitXLogInsertionsToFinish tries to wait for all locks:
>
> for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
> {
> XLogRecPtr insertingat = InvalidXLogRecPtr;
>
> do
> {
> /*
> * See if this insertion is in progress. LWLockWait will
> wait for
> * the lock to be released, or for the 'value' to be set by a
> * LWLockUpdateVar call. When a lock is initially
> acquired, its
> * value is 0 (InvalidXLogRecPtr), which means that we
> don't know
> * where it's inserting yet. We will have to wait for it. If
> * it's a small insertion, the record will most likely fit
> on the
> * same page and the inserter will release the lock without
> ever
> * calling LWLockUpdateVar. But if it has to sleep, it will
> * advertise the insertion point with LWLockUpdateVar before
> * sleeping.
> */
> if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
> &WALInsertLocks[i].l.insertingAt,
> insertingat, &insertingat))
>
> And here we stuck!

Interesting.. What should happen here is that for the backend's own
insertion slot, the "insertingat" value should be greater than the
requested flush point ('upto' variable). That's because before
GetXLogBuffer() calls AdvanceXLInsertBuffer(), it updates the backend's
insertingat value, to the position that it wants to insert to. And
AdvanceXLInsertBuffer() only calls WaitXLogInsertionsToFinish() with
value smaller than what was passed as the 'upto' argument.

> The comment to WaitXLogInsertionsToFinish says:
>
> * Note: When you are about to write out WAL, you must call this function
> * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
> * need to wait for an insertion to finish (or at least advance to next
> * uninitialized page), and the inserter might need to evict an old WAL
> buffer
> * to make room for a new one, which in turn requires WALWriteLock.
>
> Which contradicts to the observed stack trace.

Not AFAICS. In the stack trace you showed, the backend is not holding
WALWriteLock. It would only acquire it after the
WaitXLogInsertionsToFinish() call finished.

> I wonder if it is really synchronization bug in xlog.c or there is
> something wrong in this stack trace and it can not happen in case of
> normal work?

Yeah, hard to tell. Something is clearly wrong..

This line in the stack trace is suspicious:

> WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"

AdvanceXLInsertBuffer() should only ever call
WaitXLogInsertionsToFinish() with an xlog position that points to a page
bounary, but that upto value points to the middle of a page.

Perhaps the value stored in the stack trace is not what the caller
passed, but it was updated because it was past the 'reserveUpto' value?
That would explain the "request to flush past end
of generated WAL" notices you saw in the log. Now, why would that
happen, I have no idea.

If you can and want to provide me access to the system, I could have a
look myself. I'd also like to see if the attached additional Assertions
will fire.

- Heikki

Attachment Content-Type Size
extra-asserts-in-AdvanceXLInsertBuffer.patch text/plain 864 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-30 16:32:36 Re: privileges for changing schema owner
Previous Message Alvaro Herrera 2017-01-30 16:12:16 Re: multivariate statistics (v19)