Re: D.R. Site Failover (Streaming Replication) - user access / network options

From: Fernando Hevia <fhevia(at)gmail(dot)com>
To: CS DBA <cs_dba(at)consistentstate(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: D.R. Site Failover (Streaming Replication) - user access / network options
Date: 2016-03-08 18:00:41
Message-ID: CAGYT1XQ0fJX2tEW=VNewf8ayUnF9OfeNG_1d_R_g+YLGtWwq_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Mar 8, 2016 at 1:48 PM, CS DBA <cs_dba(at)consistentstate(dot)com> wrote:

>
> I do however have a few questions related to this, I'm interested to find
> out what others have done here, in particular how do you go about moving
> end users (assuming a web app is the end user entry point) to point
> seamlessly to the secondary site? Also how have you all dealt with the
> possible split brain issue (i.e. we fail over, then the primary site comes
> back up and existing/old connections to the old site then write to the old
> master)
>

While not seamlessly, you can achieve a pretty good failover rate by using
DNS servers with short TTL (under 2 min). On failure, have your monitoring
tool fire the failover scripts (promote postgres server, enable app server,
etc.) and then change the apps DNS record with the secondary site IP
address. In very short time you should have your users working on the
secondary site.

Cloudflare or Amazon's Route 56 can provide the DNS capability. It is
simple, reliable and cheap.

Once the primary site is back, split brain shouldn't be a problem since
your DNS will keep forwarding traffic to your secondary site till you
intervene to switch back.

Or... you can go with BGP and let the network team do the dirty work at the
routing level. With BGP you should also expect somewhere between 10 and 120
seconds downtime till the route changes propagate.

Cheers,
Fernando.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2016-03-09 17:00:29 Re: Getting OOM errors from PostgreSQL
Previous Message CS DBA 2016-03-08 16:48:46 D.R. Site Failover (Streaming Replication) - user access / network options