Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)
Date: 2007-08-28 14:21:02
Message-ID: 20070828102102.a641bfcb.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Markus Schiltknecht <markus(at)bluegap(dot)ch>:

> Hi,
>
> Bill Moran wrote:
> > First off, "clustering" is a word that is too vague to be useful, so
> > I'll stop using it. There's multi-master replication, where every
> > database is read-write, then there's master-slave replication, where
> > only one server is read-write and the rest are read-only. You can
> > add failover capabilities to master-slave replication. Then there's
> > synchronous replication, where all servers are guaranteed to get
> > updates at the same time. And asynchronous replication, where other
> > servers may take a while to get updates. These descriptions aren't
> > really specific to PostgreSQL -- every database replication system
> > has to make design decisions about which approaches to support.
>
> Good explanation!
>
> > Synchronous replication is only
> > really used when two servers are right next to each other with a
> > high-speed link (probably gigabit) between them.
>
> Why is that so? There's certainly very valuable data which would gain
> from an inter-continental database system. For money transfers, for
> example, I'd rather wait half a second for a round trip around the
> world, to make sure the RDBS does not 'loose' my money.

While true, I feel those applications are the exception, not the rule.
Most DBs these days are the blogs and the image galleries, etc. And
those don't need or want the overhead associated with synchronous
replication.

> > PostgreSQL-R is in development, and targeted to allow multi-master,
> > asynchronous replication without rewriting your application. As
> > far as I know, it works, but it's still beta.
>
> Sorry, this is nitpicking, but for some reason (see current naming
> discussion on -advocacy :-) ), it's "Postgres-R".

Sorry.

> Additionally, Postgres-R is considered to be a *synchronous* replication
> system, because once you get your commit confirmation, your transaction
> is guaranteed to be deliverable and *committable* on all running nodes
> (i.e. it's durable and consistent). Or put it another way: asynchronous
> systems have to deal with conflicting, but already committed
> transactions - Postgres-R does not.

I find that line fuzzy. It's synchronous for the reason you describe,
but it's asynchronous because a query that has returned successfully
is not _guaranteed_ to be committed everywhere yet. Seems like we're
dealing with a limitation in the terminology :)

> Certainly, this is slightly less restrictive than saying that a
> transaction needs to be *committed* on all nodes, before confirming the
> commit to the client. But as long as a database session is tied to a
> node, this optimization does not alter any transactional semantics. And
> despite that limitation, which is mostly the case in reality anyway, I
> still consider this to be synchronous replication.

This could potentially be a problem on (for example) a web application,
where a particular user's experience may be load-balanced to another
node at any time. Of course, you just have to write the application
with that knowledge.

> [ To get a strictly synchronous system with Postgres-R, you'd have to
> delay read only transactions on a node which hasn't applied all remote
> transactions, yet. In most cases, that's unwanted. Instead, a consistent
> snapshot is enough, just as if the transaction started *before* the
> remote ones which still need to be applied. ]

Agreed.

> > BTW: does anyone know of a link that describes these high-level concepts?
> > If not, I think I'll write this up formally and post it.
>
> Hm.. somewhen before 8.3 was released, we had lots of discussions on
> -docs about the "high availability and replication" section of the
> PostgreSQL documentation. I'd have liked to add these fundamental
> concepts, but Bruce - rightly - wanted to keep focused on existing
> solutions. And unfortunately, most existing solutions are async,
> single-master. So explaining all these wonderful theoretic concepts only
> to state that there are no real solutions would have been silly.

Someone else posted a link, and the docs look pretty comprehensive at this
point ... enough so that I'm not going to bother writing up my own
explanation.

--
Bill Moran
http://www.potentialtech.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Amiel 2007-08-28 14:30:01 Re: Tables dissapearing
Previous Message Owen Hartnett 2007-08-28 14:14:03 Re: problem with transactions in VB.NET using npgsql