Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)
Date: 2007-08-28 12:47:40
Message-ID: 46D4196C.90401@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Bill Moran wrote:
> First off, "clustering" is a word that is too vague to be useful, so
> I'll stop using it. There's multi-master replication, where every
> database is read-write, then there's master-slave replication, where
> only one server is read-write and the rest are read-only. You can
> add failover capabilities to master-slave replication. Then there's
> synchronous replication, where all servers are guaranteed to get
> updates at the same time. And asynchronous replication, where other
> servers may take a while to get updates. These descriptions aren't
> really specific to PostgreSQL -- every database replication system
> has to make design decisions about which approaches to support.

Good explanation!

> Synchronous replication is only
> really used when two servers are right next to each other with a
> high-speed link (probably gigabit) between them.

Why is that so? There's certainly very valuable data which would gain
from an inter-continental database system. For money transfers, for
example, I'd rather wait half a second for a round trip around the
world, to make sure the RDBS does not 'loose' my money.

> PostgreSQL-R is in development, and targeted to allow multi-master,
> asynchronous replication without rewriting your application. As
> far as I know, it works, but it's still beta.

Sorry, this is nitpicking, but for some reason (see current naming
discussion on -advocacy :-) ), it's "Postgres-R".

Additionally, Postgres-R is considered to be a *synchronous* replication
system, because once you get your commit confirmation, your transaction
is guaranteed to be deliverable and *committable* on all running nodes
(i.e. it's durable and consistent). Or put it another way: asynchronous
systems have to deal with conflicting, but already committed
transactions - Postgres-R does not.

Certainly, this is slightly less restrictive than saying that a
transaction needs to be *committed* on all nodes, before confirming the
commit to the client. But as long as a database session is tied to a
node, this optimization does not alter any transactional semantics. And
despite that limitation, which is mostly the case in reality anyway, I
still consider this to be synchronous replication.

[ To get a strictly synchronous system with Postgres-R, you'd have to
delay read only transactions on a node which hasn't applied all remote
transactions, yet. In most cases, that's unwanted. Instead, a consistent
snapshot is enough, just as if the transaction started *before* the
remote ones which still need to be applied. ]

> BTW: does anyone know of a link that describes these high-level concepts?
> If not, I think I'll write this up formally and post it.

Hm.. somewhen before 8.3 was released, we had lots of discussions on
-docs about the "high availability and replication" section of the
PostgreSQL documentation. I'd have liked to add these fundamental
concepts, but Bruce - rightly - wanted to keep focused on existing
solutions. And unfortunately, most existing solutions are async,
single-master. So explaining all these wonderful theoretic concepts only
to state that there are no real solutions would have been silly.

Regards

Markus

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Josh Trutwin 2007-08-28 12:53:13 Indexing Foreign Key Columns
Previous Message Kevin Kempter 2007-08-28 12:47:29 Re: One database vs. hundreds?