From: | "Jehan-Guillaume (ioguix) de Rorthais" <ioguix(at)free(dot)fr> |
---|---|
To: | Shital A <brightuser2019(at)gmail(dot)com> |
Cc: | Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Postgres HA - pacemaker RA do not support auto failback |
Date: | 2019-08-13 22:25:54 |
Message-ID: | 20190814002511.6cd5a68a@firost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 13 Aug 2019 16:38:44 +0530
Shital A <brightuser2019(at)gmail(dot)com> wrote:
> On Tue, 13 Aug 2019, 11:50 Shital A, <brightuser2019(at)gmail(dot)com> wrote:
>
> > On Mon, 12 Aug 2019, 18:54 Adrien Nayrat, <adrien(dot)nayrat(at)anayrat(dot)info>
> > wrote:
> >
> >> On 8/12/19 2:57 PM, Shital A wrote:
> >> > Postgres version : 9.6
> >> > OS:Rhel 7.6
> >> >
> >> > We are working on HA setup for postgres cluster of two nodes in
> >> > active-passive mode.
> >> >
> >> > Installed:
> >> > Pacemaker 1.1.19
> >> > Corosync 2.4.3
> >> >
> >> > The pacemaker agent with this installation doesn't support automatic
> >> > failback.
> >> > What I mean by that is explained below:
> >> > 1. Cluster is setup like A - B with A as master.
> >> > 2. Kill services on A, node B will come up as master.
> >> > 3. node A is ready to join the cluster, we have to delete the lock file
> >> > it creates on any one of the node and execute the cleanup command to
> >> > get the node back as standby
> >> >
> >> > Step 3 is manual so HA is not achieved in real sense.
> >> >
> >> > Please help to check:
> >> > 1. Is there any version of the resouce agent which supports automatic
> >> > failback?
> >> > To avoid generation of lock file and deleting it.
> >> >
> >> > 2. If there is no such support, what checks should be added in pgsql RA
> >> > to achieve
> >>
> >> Which RA did you use? AFAIK there is two RA :
> >> - pgsql :
> >> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
> >> - PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql
> >> RA :
> >> http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql)
> >>
> >> If I am right, PAF should handle the case when there no need to perform
> >> pgrewind
> >> (if the old master is not too advanced in the transaction log).
PAF only make sure controlled failover (aka switchover) performs safely so the
old master hook back as a standby correctly.
Should a failure occurs with an automatic failover, PAF has no dark magic to
automagicaly failoback the failing master as a standby.
> > We are using the pgsql RA installed with pacemaker.
> >
> > Is PAF recommended over pgsql?
Yes, as much as the historical RA.
> > Has anyone changed pgsql to handle the lock file and other cases that
> > might not have been handled?
I don't know. However, I do think it would be either presumptuous to climb this
road today.
> Looks like the previous email bounced, received undelivered message.
> Sending again. We are using the pgsql RA installed with pacemaker and seems
> like it prevents failback by creating a lock file. We want to implement
> auto failback and need pointers on how to implement.
The lock file is here because PgSQL has no safety belt against data corruption
when an old master hook on a new one while its LSN was further away from
the timeline fork. You must take care of this by yourself.
Moreover, should a failover occurs, you should probably check WHY it occurred
and fix things before performing a blindly failback of a potential wounded
instance. A wounded instance can hurt your cluster in many different and
inconfortable ways.
Last, as Adrien wrote, you can manually use pg_rewind to revert the old master
to the timeline fork, as far as you enabled checksums or 'wal_log_hints=on' and
kept enough WAL around.
Another option is just to make sure you are able to build a new standby as was
fast as possible.
But remember: if your data is important, use quorum, fencing and watchdog.
Good luck.
From | Date | Subject | |
---|---|---|---|
Next Message | Derek Hans | 2019-08-13 22:56:40 | Re: GIST/GIN index not used with Row Level Security |
Previous Message | Stephen Frost | 2019-08-13 21:39:30 | Re: GIST/GIN index not used with Row Level Security |