corner case about replication and shutdown

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: corner case about replication and shutdown
Date: 2011-03-23 13:46:01
Message-ID: AANLkTiniNAhRAr+qDj39uJnHkGHFSFyMn6+Z+0ubBfmC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When I read the shutdown code to create the smart shutdown patch for sync rep,
I found the corner case where shutdown can get stuck infinitely. This happens
when postmaster reaches PM_WAIT_BACKENDS state before walsender marks
itself as WAL sender process for streaming WAL (i.e., before walsender calls
MarkPostmasterChildWalSender). In this case,CountChildren(NORMAL) in
PostmasterStateMachine() returns non-zero because normal backend (i.e.,
would-be walsender) is running, and postmaster in PM_WAIT_BACKENDS state
gets out of PostmasterStateMachine(). Then the backend receives
START_REPLICATION command, declares itself as walsender and
CountChildren(NORMAL) returns zero.

The problem is; that declaration doesn't trigger
PostmasterStateMachine() at all.
So, even though there is no normal backends, postmaster cannot call
PostmasterStateMachine() and move its state from PM_WAIT_BACKENDS.

I think this problem is harmless in practice since it doesn't happen
too often. But
that can happen...

The simple fix is to change ServerLoop() so that it periodically calls
PostmasterStateMachine() while shutdown is running. Though I was thinking to
change PostmasterStateMachine(), that looked complicated. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-03-23 14:56:44 Re: crash-safe visibility map, take four
Previous Message Simon Riggs 2011-03-23 13:38:50 Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,