Re: Replication

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Fujii Masao <fujii(dot)masao(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication
Date: 2006-08-24 09:18:35
Message-ID: 44ED6EEB.5080000@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Jeff Davis wrote:
> I disagree about high-availability. In fact, I would say that sync
> replication is trading availability and performance for synchronization
> (which is a valid tradeoff, but costly).

In a way, replication is for databases systems what RAID1 is for hard
drives. Having multiple cluster nodes in sync minimizes the risk of a
complete outage due to hardware failure. Thus maximizing availability.
Of course, as you say, traded for performance.

> If you have an async system, all nodes must go down for the system to go
> down.

Yes. But it takes only one node to go down to potentially lose committed
transactions. In contrast to synchronous replication systems, where a
committed transaction is guaranteed to be 'committed on the cluster'. So
if at least one node of the cluster is up and running, you can be
assured to have consistent data.

Please note that the Postgres-R approach does relax some of these
constraints a little to gain performance. The most obvious result of
these relaxations is that the nodes may 'behind' with replaying
transactions and show a past view of the data.

> If you have a sync system, if any node goes down the system goes down.

That's plain wrong.

> If you plan on doing failover, consider this: what if it's not obvious
> which system is still up? What if the network route between the two
> systems goes down (or just becomes too slow to replicate over), but
> clients can still connect to both servers? Then you have two systems
> that both think that the other system went down, and both start
> accepting transactions. Now you no longer have replication at all.

This problem is often called 'network partitioning', which also refers
to a more general case: a group of M nodes being split into two groups
of N and (M-N) nodes (due to network failure or whatever).

In Postgres-R a Group Communication System is used to cover all these
aspects (error detection, congruent agreement on a major group, etc..).

Regards

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-08-24 09:19:21 Re: [PATCHES] selecting large result sets in psql using
Previous Message Jim C. Nasby 2006-08-24 09:06:55 Re: Can I assume there's only one _RETURN rule?