Re: Sync Rep and shutdown Re: Sync Rep v19

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Date: 2011-03-16 17:35:46
Message-ID: AANLkTi=aSb8Tg9J62ykHe_3_OUop79q1uk6sMF7TLTDR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 16, 2011 at 7:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> The only idea I have for allowing fast shutdown to still be fast, even
>>> when sync rep is involved, is to shut down the system in two phases.
>>> The postmaster would need to stop accepting new connections, and first
>>> kill off all the backends that aren't waiting for sync rep.  Then,
>>> once all remaining backends are waiting for sync rep, we can have them
>>> proceed as above: close the connection without acking the commit or
>>> throwing ERROR/FATAL, and exit.  That's pretty complicated, especially
>>> given the rule that the postmaster mustn't touch shared memory, but I
>>> don't see any alternative.
>>
>> What extra capability are we actually delivering by doing that??
>> The risk of introducing a bug and thereby losing data far outweighs the
>> rather dubious benefit.
>
> Well, my belief is that when users ask the database to shut down, they
> want it to work.  If I'm the only one who thinks that, then whatever.
> But I firmly believe we'll get bug reports about this.

On further review, the approach proposed above doesn't really work,
because a backend can get a SIGTERM either because the system is doing
a fast shutdown or because a user has issued
pg_terminate_backend(PID); and in the latter case we have to continue
letting in connections.

As of right now, synchronous replication continues to wait even when:

- someone tries to perform a fast shutdown
- someone tries to kill the backend using pg_terminate_backend()
- someone attempts to cancel the query using pg_cancel_backend() or by
pressing control-C in, for example, psql
- someone attempts to shut off synchronous replication by changing
synchronous_standby_names in postgresql.conf and issuing pg_ctl reload

We've worked pretty hard to ensure that things like query cancel and
shutdown work quickly and reliably, and I don't think we want to make
synchronous replication the one part of the system that departs from
that general principle.

So, patch attached. This patch arranges to do the following things:

1. If a die interrupt is received (pg_terminate_backend or fast
shutdown), then terminate the sync rep wait and arrange for the
connection to be closed without acknowledging the commit (but do send
a warning message back). The commit still happened, though, so other
transactions will see its effects. This is unavoidable unless we're
willing to either ignore attempts to terminate a backend waiting for
sync rep, or panic the system when it happens, and I don't think
either of those is appropriate.

2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
then cancel the sync rep wait and issue a warning before acknowledging
the commit. Again, the alternative is to either ignore the cancel or
panic, neither of which I believe to be what users will want.

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up. As I mentioned before, reloading the
config file from within the waiting backend (which can't safely throw
an error) seems risky, so what I did instead is made WAL writer
responsible for handling this. Nobody's allowed to wait for sync rep
unless a global shared memory flag is set, and the WAL writer process
is responsible for setting and clearing this flag when the config file
is reloaded. This has basically no performance cost; WAL writer only
ever does any extra work at all with this code when it receives a
SIGHUP, and even then the work is trivial except in the case where
synchronous_standby_names has changed from empty to non-empty or visca
versa. The advantage of putting this in WAL writer rather than, say,
bgwriter is that WAL writer doesn't have nearly as many jobs to do and
they don't involve nearly as much I/O, so the chances of a long delay
due to the process being busy are much less.

4. Remove the SYNC_REP_MUST_DISCONNECT state, which actually does
absolutely nothing right now, despite what the name would seem to
imply. In particular, it doesn't arrange for any sort of disconnect.
This patch does arrange for that, but not using this mechanism.

5. The existing code relies on being able to read MyProc->syncRepState
without holding the lock, even while a WAL sender must be updating it
in another process. I'm not 100% sure this is safe on a
multi-processor machine with weak memory ordering. In practice, the
chances of something going wrong here seem extremely small. You'd
need something like this: a WAL sender updates MyProc->syncRepState
just after the wait timeout expires and before the latch is reset, but
the regular backend fails to see the state change due to
memory-ordering effects and drops through the loop, waiting another 60
s, and then finally wakes up and completes the wait (but a minute
later than expected). That seems vanishingly unlikely but it's also
simple to protect against, so I did.

Review appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
sync-rep-wait-fixes.patch application/octet-stream 16.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-16 21:38:09 Re: Rectifying wrong Date outputs
Previous Message Tom Lane 2011-03-16 14:27:37 Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,