From: | Markus Schiltknecht <markus(at)bluegap(dot)ch> |
---|---|
To: | Bill Moran <wmoran(at)potentialtech(dot)com> |
Cc: | Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Bigtime scaling of Postgresql (cluster and stuff I suppose) |
Date: | 2007-08-28 12:47:40 |
Message-ID: | 46D4196C.90401@bluegap.ch |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
Bill Moran wrote:
> First off, "clustering" is a word that is too vague to be useful, so
> I'll stop using it. There's multi-master replication, where every
> database is read-write, then there's master-slave replication, where
> only one server is read-write and the rest are read-only. You can
> add failover capabilities to master-slave replication. Then there's
> synchronous replication, where all servers are guaranteed to get
> updates at the same time. And asynchronous replication, where other
> servers may take a while to get updates. These descriptions aren't
> really specific to PostgreSQL -- every database replication system
> has to make design decisions about which approaches to support.
Good explanation!
> Synchronous replication is only
> really used when two servers are right next to each other with a
> high-speed link (probably gigabit) between them.
Why is that so? There's certainly very valuable data which would gain
from an inter-continental database system. For money transfers, for
example, I'd rather wait half a second for a round trip around the
world, to make sure the RDBS does not 'loose' my money.
> PostgreSQL-R is in development, and targeted to allow multi-master,
> asynchronous replication without rewriting your application. As
> far as I know, it works, but it's still beta.
Sorry, this is nitpicking, but for some reason (see current naming
discussion on -advocacy :-) ), it's "Postgres-R".
Additionally, Postgres-R is considered to be a *synchronous* replication
system, because once you get your commit confirmation, your transaction
is guaranteed to be deliverable and *committable* on all running nodes
(i.e. it's durable and consistent). Or put it another way: asynchronous
systems have to deal with conflicting, but already committed
transactions - Postgres-R does not.
Certainly, this is slightly less restrictive than saying that a
transaction needs to be *committed* on all nodes, before confirming the
commit to the client. But as long as a database session is tied to a
node, this optimization does not alter any transactional semantics. And
despite that limitation, which is mostly the case in reality anyway, I
still consider this to be synchronous replication.
[ To get a strictly synchronous system with Postgres-R, you'd have to
delay read only transactions on a node which hasn't applied all remote
transactions, yet. In most cases, that's unwanted. Instead, a consistent
snapshot is enough, just as if the transaction started *before* the
remote ones which still need to be applied. ]
> BTW: does anyone know of a link that describes these high-level concepts?
> If not, I think I'll write this up formally and post it.
Hm.. somewhen before 8.3 was released, we had lots of discussions on
-docs about the "high availability and replication" section of the
PostgreSQL documentation. I'd have liked to add these fundamental
concepts, but Bruce - rightly - wanted to keep focused on existing
solutions. And unfortunately, most existing solutions are async,
single-master. So explaining all these wonderful theoretic concepts only
to state that there are no real solutions would have been silly.
Regards
Markus
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Trutwin | 2007-08-28 12:53:13 | Indexing Foreign Key Columns |
Previous Message | Kevin Kempter | 2007-08-28 12:47:29 | Re: One database vs. hundreds? |