Re: Replication

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Fujii Masao <fujii(dot)masao(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication
Date: 2006-08-24 18:31:46
Message-ID: 1156444306.15743.221.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2006-08-24 at 11:18 +0200, Markus Schiltknecht wrote:
> Hi,
>
> Jeff Davis wrote:
> > I disagree about high-availability. In fact, I would say that sync
> > replication is trading availability and performance for synchronization
> > (which is a valid tradeoff, but costly).
>
> In a way, replication is for databases systems what RAID1 is for hard
> drives. Having multiple cluster nodes in sync minimizes the risk of a
> complete outage due to hardware failure. Thus maximizing availability.
> Of course, as you say, traded for performance.
>
> > If you have an async system, all nodes must go down for the system to go
> > down.
>
> Yes. But it takes only one node to go down to potentially lose committed
> transactions. In contrast to synchronous replication systems, where a
> committed transaction is guaranteed to be 'committed on the cluster'. So
> if at least one node of the cluster is up and running, you can be
> assured to have consistent data.

Right, that's the cost of asynchronous replication.

> Please note that the Postgres-R approach does relax some of these
> constraints a little to gain performance. The most obvious result of
> these relaxations is that the nodes may 'behind' with replaying
> transactions and show a past view of the data.
>
> > If you have a sync system, if any node goes down the system goes down.
>
> That's plain wrong.

Ok, maybe not one node, but I don't think I'm totally off base. See my
explanation below.

> > If you plan on doing failover, consider this: what if it's not obvious
> > which system is still up? What if the network route between the two
> > systems goes down (or just becomes too slow to replicate over), but
> > clients can still connect to both servers? Then you have two systems
> > that both think that the other system went down, and both start
> > accepting transactions. Now you no longer have replication at all.
>
> This problem is often called 'network partitioning', which also refers
> to a more general case: a group of M nodes being split into two groups
> of N and (M-N) nodes (due to network failure or whatever).
>
> In Postgres-R a Group Communication System is used to cover all these
> aspects (error detection, congruent agreement on a major group, etc..).
>

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.

Sure, if you have a million groups of servers spread all over the
internet, it works with a very high degree of reliability because you
can likely always form a quorum. However, you then have horrible
performance because the updates need to be spread to so many locations.
And for truly synchronous replication you probably have to serialize the
updates, which is very costly over that many nodes all over a network.

Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group. This kind of degradation could
happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.

In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2006-08-24 18:32:40 Re: PL/Perl: spi_prepare() and RETURNING
Previous Message David Fetter 2006-08-24 18:31:02 PL/Perl: spi_prepare() and RETURNING