Re: Sync Rep Design

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Josh Berkus <josh(at)postgresql(dot)org>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, greg(at)2ndquadrant(dot)com, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2011-01-02 10:39:45
Message-ID: 1293964785.1892.74319.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2011-01-01 at 22:11 -0500, Aidan Van Dyk wrote:
> On Sat, Jan 1, 2011 at 6:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Sat, 2011-01-01 at 14:40 -0800, Josh Berkus wrote:
> >
> >> Standby in general deals with the A,D,R triangle (Availability,
> >> Durability, Response time). "Any one" configuration is the A,R
> >> configuration, and the only reason to go out with it for 9.1 is
> >> because it's simpler to implement than the D,R configuration (all
> >> standbys must ack).
> >
> > Nicely put. Not the "only reason" though...
> >
> > As I showed earlier, the AR gives you 99.999% availability and the DR
> > gives you 94% availability, considering a 3 server config. If you add
> > more servers, the availability of the DR option gets much worse, very
> > quickly.
> >
> > The performance of AR is much better also, and stays same or better as
> > cluster size increases. DR choice makes performance degrade as cluster
> > size increases, since it works at the speed of the slowest node.
>
> I'm all for getting first-past-post in for 9.1. Otherwise I fear
> we'll get nothing.
>
> Stephen and I will only be able to use 1 sync slave, the "DR-site"
> one.

No, the AR and DR options are identical with just one sync standby.

You've been requesting the DR option with 2 standbys, which is what
gives you 94% availability.

> That's fine. I can live with it, and make my local slave be
> async. Or replicate the FS/block under WAL. I can monitor the ****
> out of it, and unless it goes "down", it should easily be able to keep
> up with the remote sync one beind a slower WAN link.
>
> And I think both Stephen and I understand your availability math.
> We're not arguing that the 1st past post both gives better query
> availabiliyt, and cluster scale performance.
>
> But when the primary datacenter servers are dust in the crater (or
> boats in the flood, or ash in the fire), I either keep my job, or I
> don't. And that depends on whether there is a chance I (my database
> system) confirmed a transaction that I can't recover.

I'm not impressed. You neglect to mention that Oracle and MySQL would
put you in exactly the same position.

You also neglect to say that if the local standby goes down, you were
advocating a design that would take the whole application down. If you
actually did what you have been suggesting, and the cluster went down as
it inevitably would do, once your colleagues realise that you knowingly
configured the cluster to have only 94% availability, you won't have a
job anymore, you'll be escorted off the premises while shouting "but
while it was down, it lost no data". When that never happens, thank me.

There are people that need more durability than availability, but not
many. If the database handles high value transactions, they very
probably want it to keep on processing high value transactions.

You'll have the choice of how to configure it, because of me listening
to other people's views and selecting only the ideas that make sense.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-01-02 10:39:51 Re: Sync Rep Design
Previous Message Magnus Hagander 2011-01-02 09:36:35 Re: Libpq PGRES_COPY_BOTH - version compatibility