Re: Postgres Clustering Options

From: John R Pierce <pierce(at)hogranch(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: David Kerr <dmk(at)mr-paradox(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres Clustering Options
Date: 2009-11-11 19:05:32
Message-ID: 4AFB0AFC.9050904@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Smith wrote:
> It sounds like you've got the basics nailed down here and are on a
> well trod path, just one not one documented publicly very well. Since
> you said that even DRBD was too much overhead for you, I think a dive
> into evaluating the commercial clustering approaches (or the free
> LinuxHA that RedHat's is based on, which I haven't been real impressed
> by) would be appropriate. The hard part is generally getting a
> heartbeat between the two servers sharing the SAN that is both
> sensitive enough to catch failures while not being so paranoid that it
> fails over needlessly (say, when load spikes on the primary and it
> slows down). Make sure you test that part out very carefully with any
> vendor you evaluate.

hence the 'multiple dedicated heartbeat networks' previously suggested.

a typical cluster server has a quad ethernet, 2 ports (802.3ad link
aggregation w/ failover) for the LAN, and 2 dedicated for the heartbeat,
then a dual HBA for the SAN. the heartbeats can run over crossover
cables, even 10baseT is plenty as the traffic volume is quite low, it
just needs low latency and no possibility of congestion.

I setup the RHCS aka CentOS Cluster in a test lab environment... it
seemed to work well enough. I was using FC storage via a QLogic SANbox
5600 switch, which was supported by RHCS as a fencing device...

Note that ALL of the storage used by the cluster servers on the SAN
should be under cluster management as the 'standby' server won't see any
of it when its fenced (I implemented fencing via FC port disable).
This is can be an issue when you want to do rolling upgrades (update the
standby server, force a failover, update the previous master). I
built each cluster node with its own direct attached mirrored storage
for the OS and software.

> As far as the PostgreSQL specifics go, you need a solid way to ensure
> you've disconnected the now defunct master from the SAN (the classic
> "shoot the other node in the head" problem). All you *should* have to
> do is start the database again on the backup after doing that. That
> will come up as a standard crash, run through WAL replay crash
> recovery, and the result should be no different than had you restarted
> after a crash on the original node. The thing you cannot let happen
> is allowing the original master to continue writing to the shared SAN
> volume once that transition has happened.
>

which is what 'storage fencing' prevents.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Mead 2009-11-11 19:58:50 Re: Incremental Backups in postgres
Previous Message David Kerr 2009-11-11 19:01:41 Re: Postgres Clustering Options