Re: Sync Rep and shutdown Re: Sync Rep v19

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Date: 2011-03-16 08:51:14
Message-ID: 1300265474.20494.7579.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote:
> On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > Same as above. I think that it's more problematic to leave the code
> > as it is. Because smart/fast shutdown can make the server get stuck
> > until immediate shutdown is requested.
>
> I agree that the current state of affairs is a problem. However,
> after looking through the code somewhat carefully, it looks a bit
> difficult to fix. Suppose that backend A is waiting for sync rep. A
> fast shutdown is performed. Right now, backend A shrugs its shoulders
> and does nothing. Not good. But suppose we change it so that backend
> A closes the connection and exits without either confirming the commit
> or throwing ERROR/FATAL. That seems like correct behavior, since, if
> we weren't using sync rep, the client would have to interpret that as
> indicating that the connection denied in mid-COMMIT, and mustn't
> assume anything about the state of the transaction. So far so good.
>
> The problem is that there may be another backend B waiting on a lock
> held by A. If backend A exits cleanly (without a PANIC), it will
> remove itself from the ProcArray and release locks. That wakes up A,
> which can now go do its thing. If the operating system is a bit on
> the slow side delivering the signal to B, then the client to which B
> is connected might manage to see a database state that shows the
> transaction previous running in A as committed, even though that
> transaction wasn't committed. That would stink, because the whole
> point of having A hold onto locks until the standby ack'd the commit
> was that no other transaction would see it as committed until it was
> replicated.
>
> This is a pretty unlikely race condition in practice but people who
> are running sync rep are intending precisely to guard against unlikely
> failure scenarios.
>
> The only idea I have for allowing fast shutdown to still be fast, even
> when sync rep is involved, is to shut down the system in two phases.
> The postmaster would need to stop accepting new connections, and first
> kill off all the backends that aren't waiting for sync rep. Then,
> once all remaining backends are waiting for sync rep, we can have them
> proceed as above: close the connection without acking the commit or
> throwing ERROR/FATAL, and exit. That's pretty complicated, especially
> given the rule that the postmaster mustn't touch shared memory, but I
> don't see any alternative.

> We could just not allow fast shutdown, as
> now, but I think that's worse.

Please explain why not allowing fast shutdown makes it worse?

For me, I'd rather not support a whole bunch of dubious code, just to
allow you to type -m fast when you can already type -m immediate.

What extra capability are we actually delivering by doing that??
The risk of introducing a bug and thereby losing data far outweighs the
rather dubious benefit.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-03-16 09:11:51 Re: Replication server timeout patch
Previous Message Simon Riggs 2011-03-16 08:44:44 Re: How should the waiting backends behave in sync rep?