Re: postmaster recovery and automatic restart suppression

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: "Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf(dot)czichy(at)nsn(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, ext Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, ext Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, ext Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Kolb, Harald (NSN - DE/Munich)" <harald(dot)kolb(at)nsn(dot)com>
Subject: Re: postmaster recovery and automatic restart suppression
Date: 2009-06-17 07:36:36
Message-ID: 3f0b79eb0906170036j13f643afjf53c9b134453b3c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN -
FI/Helsinki)<thoralf(dot)czichy(at)nsn(dot)com> wrote:
> [STONITH is not always best strategy if failures can be declared as
> user-space software problem only, limit STONITH to HW/OS failures]
>
> The isolation of the failing Postgres instance does not require a
> STONITH
> - mainly as there's also other software running on the same node that
> we'd
> not want to automatically switchover (e.g. because it takes longer to do
> or
> the functionality is more critical or less critical). Also we generally
> trust
> the HW, OS kernel and cluster middleware to behave correctly . These
> functions
> also follow the principle of fail-fast-and-safe. This trust might be an
> assumption that not everybody agrees with, though. So, if the failure
> originated
> from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
> user-space problem - the default assumption is that isolation can be
> implemented on
> OS-level and that's a guarantee that the clusterware gives (using a
> separate
> Quorum mechanism to avoid split-brain situations).

HW-level STONITH seems to be too much for your case. How about making
your HA-middleware shut the dying postgres down before doing switchover
by using (for example) "pg_ctl -mi stop"? In this case, other
softwares can still
keep on running on the original node after switchover.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2009-06-17 08:29:10 Re: GRANT ON ALL IN schema
Previous Message Stefan Kaltenbrunner 2009-06-17 04:39:51 Re: concurrent COPY performance