Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Date: 2010-03-25 08:08:11
Message-ID: 1269504491.8481.8965.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-docs pgsql-hackers

On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote:
> On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > PANICing won't change the situation, so it just destroys server
> > availability. If we had 1 master and 42 slaves then this behaviour would
> > take down almost the whole server farm at once. Very uncool.
> >
> > You might have reason to prevent the server starting up at that point,
> > when in standby mode, but that is not a reason to PANIC. We don't really
> > want all of the standbys thinking they can be the master all at once
> > either. Better to throw a serious ERROR and have the server still up and
> > available for reads.
>
> OK. How about making the startup process emit WARNING, stop WAL replay and
> wait for the presence of trigger file, when an invalid record is found?
> Which keeps the server up for readonly queries. And if the trigger file is
> found, I think that the startup process should emit a FATAL, i.e., the
> server should exit immediately, to prevent the server from becoming the
> primary in a half-finished state. Also to allow such a halfway failover,
> we should provide fast failover mode as pg_standby does?

The lack of docs begins to show a lack of coherent high-level design
here. By now, I've forgotten what this thread was even about. The major
design decision in this that keeps showing up is "remove pg_standby, at
all costs" but no reason has ever been given for that. I do believe
there is a "better way", but we won't find it by trial and error, even
if we had time to do so.

Please work on some clear docs for the failure modes in this system.
That way we can all read them and understand them, or point out further
issues. Moving straight to code is not a solution to this, since what we
need now is to all agree on the way forwards. If we ignore this, then
there is considerable risk that streaming rep will have a fatal
operational flaw.

Please just document/diagram how it works now, highlighting the problems
that still remain to be solved. We're all behind you and I'm helping
wherever I can.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-03-25 08:11:45 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous Message User Itagaki 2010-03-25 07:18:54 reorg - pg_batch: Imported Sources

Browse pgsql-docs by date

  From Date Subject
Next Message Heikki Linnakangas 2010-03-25 08:11:45 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous Message Tom Lane 2010-03-25 02:14:37 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-03-25 08:11:45 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous Message Pei He 2010-03-25 02:41:47 Ask help for putting SP-Gist into postgresql