[PATCH] Race condition in logical walsender causes long postgresql shutdown delay

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay
Date: 2019-07-25 01:24:26
Message-ID: CAMsr+YEuz4XwZX_QmnX_-2530XhyAmnK=zCmicEnq1vLr0aZ-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi folks

I recently tracked down a race in shutdown of logical walsenders that can
cause PostgreSQL shutdown to hang for wal_sender_timeout/2 before it
continues to a normal shutdown. With a long timeout that can be quite
disruptive.

TL;DR: The logical walsender may be signalled to stop, then read the last
WAL record before the shutdown checkpoint is due to be written and go to
sleep. The checkpointer will wait for it to acknowledge the shutdown and
the walsender will wait for new WAL. The deadlock is eventually broken by
the walsender timeout keepalive timer.

Patch attached.

The issue arises from the difference between logical and physical walsender
shutdown as introduced by commit c6c3334364 "Prevent possibility of panics
during shutdown checkpoint". It's probably fairly hard to trigger. I ran
into a case where it happened regularly only because of an unrelated patch
that caused some WAL to be written just before the checkpointer issued
walsender shutdown requests. But it's still a legit bug.

If you hit the issue you'll see that walsender(s) can be seen to be
sleeping in WaitLatchOrSocket in WalSndLoop. They'll keep sleeping until
woken by the keepalive timeout. The checkpointer will be waiting in
WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or
exit, whichever happens first. The postmaster will be waiting in ServerLoop
for the checkpointer to finish the shutdown checkpoint.

The checkpointer waits in WalSndWaitStopping() for all walsenders to either
exit or enter WALSNDSTATE_STOPPING state. Logical walsenders never enter
WALSNDSTATE_STOPPING, they go straight to exiting, so the checkpointer
can't finish WalSndWaitStopping() and write the shutdown checkpoint. A
logical walsender usually notices the shutdown request and exits as soon as
it has flushed all WAL up to the server's flushpoint, while physical
walsenders enter WALSNDSTATE_STOPPING.

But there's a race where a logical walsender may read the final available
record and notice it has caught up - but not notice that it has reached
end-of-WAL and check whether it should exit. This happens on the following
(simplified) code path in XLogSendLogical:

if (record != NULL)
{
XLogRecPtr flushPtr = GetFlushRecPtr();
LogicalDecodingProcessRecord(...);
sentPtr = ...;
if (sentPtr >= flushPtr)
WalSndCaughtUp = true; // <-- HERE
}

because the test for got_STOPPING that sets got_SIGUSR2 is only on the
other branch where getting a record returns `NULL`; this branch can sleep
before checking if shutdown was requested.

So if the walsender read the last WAL record available, when it's >= the
flush pointer and it already handled the SIGUSR1 latch wakeup for the WAL
write, it might go back to sleep and not wake up until the timeout.

The checkpointer already sent PROCSIG_WALSND_INIT_STOPPING to the
walsenders in the prior WalSndInitStopping() call so the walsender won't be
woken by a signal from the checkpointer. No new WAL will be written because
the walsender just consumed the final record written before the
checkpointer went to sleep, and the checkpointer won't write anything more
until the walsender exits. The client might not be due a keepalive for some
time.The only reason this doesn't turn into a total deadlock is that
keepalive wakeup.

An alternative fix would be to have the logical walsender set
WALSNDSTATE_STOPPING instead of faking got_SIGUSR2, then go to sleep
waiting for more WAL. Logical decoding would need to check if it was
running during shutdown and Assert(...) then ERROR if it saw any WAL
records that result in output plugin calls or snapshot management calls. I
avoided this approach as it's more intrusive and I'm not confident I can
concoct a reliable test to trigger it.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

Attachment Content-Type Size
0001-Fix-a-delay-in-PostgreSQL-shutdown-caused-by-logical.patch text/x-patch 4.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-07-25 01:39:36 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message Thomas Munro 2019-07-25 01:11:59 Re: Should we add xid_current() or a int8->xid cast?