Replication Docs

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: pgsql-docs(at)postgresql(dot)org
Cc: bruce(at)momjian(dot)us
Subject: Replication Docs
Date: 2006-11-22 09:02:05
Message-ID: 4564120D.80304@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Hello Bruce,

I was trying to put together all comments to specific sections, thus the
new thread. Hope that helps.

*** Synchronous Multi-Master Replication ***

Bruce Momjian wrote:
> OK, new title is "Synchonous Multi-Master Replication", and the next
> heading is "Asynchronous Multi-Master Replication".

Good, I really like that one. :-)

>> Why not simply call in "Multi Master Replication"? That implies
>> clustering, doesn't it?
>
> Well, not really because of the async multi-master that is the next
> item.

Yes, it's fine that way. I was just unsure if you want to have sync and
async in one paragraph or not. The proposal "Multi Master Replication"
would only fit if we'd describe both in one paragraph. I like to
describe both in more detail, as you did now.

>> BTW, I'm slowly beginning to accept that you don't want to mix
>> "Statement-Based Replication Middleware" with "Multi Master
>> Replication". ;-)
>
> OK, are they mixed now?

No, they're not. They're split, which I think is what you want. I've
been uncomfortable with was that split into "Statement-Based Replication
Middleware" and "Synchronous Multi-Master Replication". I've been
arguing that the first describes one possible implementation of the
second, while other implementations are not described (2PC, SHMEM,
Postgres-R, etc...)

I was trying to say that I'm beginning to accept that split, because
especially pgpool really seems to put a lot of those burdens to the
user. I've been trying to use some humor, but that mainly seems to
confuse people. My english might not be good enough for humor, yet.

However, where do you now fit Sequoia in? It uses "statement-based
replication", but AFAIK it is much more clever than pgpool and handles
non-deterministic functions. And the Sequoia people probably won't get
excited about not calling them "Multi-Master Replication".

Bruce Momjian wrote:
> I just saw it [the slides about PGCluster-II]. It does seem more like
> Oracle RAC than any other method.

Yes. I think it's not production ready, yet, so there's no point in
mentioning it in the documentation.

Bruce Momjian wrote:
> I figured that shared-disk/memory only really makes sense for
> multi-master clustering, so I mentioned it in that paragraph:
>
> ...<snipped the new paragraph>
>
> Is that enought?

I'd say so, yes. We are not going into more details for other aspects so
that's fine.

You might not even mention shared-memory. I don't know of any
implementation in the database world. Except perhaps using OpenMosix and
running PostgreSQL on top of it. Maybe just leave it in there, it won't
hurt.

Bruce Momjian wrote:
> One problem I have is that we we have shared disk failover, but no
> other shared case with a PostgreSQL implementation, and people don't
> want to mention Oracle RAC, so why do we mention it if we have no
> implementations even in the works.

Most probably you're already aware that with PGCluster-II we have such
an implementation in the works.

*** Asynchronous Multi-Master Replication ***

>> Again, IMHO, "Parallel Query Execution" says everything. The word
>> 'Clustering' does not help, because it's not defined nor commonly
>> used in any helpful way (probably besides marketing).
>
> OK, new title is Multi-Server Parallel Query Execution. If I have
> just "Parallel Query Execution", it could be multi-process parallel
> query execution.

Yes, the new title is good.

In the text below, you are mainly describing what I call 'disconnected
operation' (somebody have a better, more common term for that?). But the
main advantage of async replication is having no delay before commit.
Thus giving better performance for writing transactions.

In case of async, multi master replication, conflicts can arise, which
have to be resolved. I think your example does not make it clear that
this applies to async, multi master replication in general. And that
those can sometimes be resolved automatically.

*** Multi-Master Parallel Query Execution ***

Bruce Momjian wrote:
> Uh, multi-master replication allows for load balancing, but it doesn't
> help a single query to run any faster. Think of having only one query
> running on the cluster. Parallel execution allows a single query to
> use more than one computer, right?

Right.

> Uh, this confuses me. What is missing? You split tables across
> multiple servers.

In "Multi-Master Parallel Query Execution" you write: "One possible way
this could work is for the data to be split among servers". So the
example you give involves Data Partitioning.

I wanted to point out that another way to do Parallel Query Execution is
using Multi-Master Replication to have equal replicas and then query
them in parallel. I don't think there is any solution for that, yet.
Except, perhaps PGPool-II can do it?

*** Introduction Text on the top ***

Bruce Momjian wrote:
> OK, updated to add "little" delay, and removed "small" from async
> case:
>
> load-balanced servers will return consistent results with little
> propagation delay. Asynchronous updating has a delay between the

Hm, that does not address my concerns. But after thinking about it, I
can accept the term 'consistent results' - it's clear enough what it
means. I'm probably thinking into too many details...

But now, the "little delays" certainly is in the wrong place. Such
delays occur before commit, not before returning results.

Maybe revert it back to "..no propagation delay". Or completely leave
away the "no propagation delay".

Sorry for the noise here.

Regards

Markus

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Markus Schiltknecht 2006-11-22 10:54:00 Re: [Sequoia] PostgreSQL Documentation of High Availability and Load
Previous Message Bruce Momjian 2006-11-22 04:28:43 Re: [Pgcluster-general] PostgreSQL Documentation of