Re: Replication Docs

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: pgsql-docs(at)postgresql(dot)org
Subject: Re: Replication Docs
Date: 2006-11-22 17:36:54
Message-ID: 200611221736.kAMHasi00788@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Markus Schiltknecht wrote:
> Hello Bruce,
>
> I was trying to put together all comments to specific sections, thus the
> new thread. Hope that helps.
>
> *** Synchronous Multi-Master Replication ***
>
> Bruce Momjian wrote:
> > OK, new title is "Synchonous Multi-Master Replication", and the next
> > heading is "Asynchronous Multi-Master Replication".
>
> Good, I really like that one. :-)

Great (until we change it again) ;-)

> >> Why not simply call in "Multi Master Replication"? That implies
> >> clustering, doesn't it?
> >
> > Well, not really because of the async multi-master that is the next
> > item.
>
> Yes, it's fine that way. I was just unsure if you want to have sync and
> async in one paragraph or not. The proposal "Multi Master Replication"
> would only fit if we'd describe both in one paragraph. I like to
> describe both in more detail, as you did now.

OK, it is two separate entries now:

http://momjian.us/main/writings/pgsql/sgml/high-availability.html

> >> BTW, I'm slowly beginning to accept that you don't want to mix
> >> "Statement-Based Replication Middleware" with "Multi Master
> >> Replication". ;-)
> >
> > OK, are they mixed now?
>
> No, they're not. They're split, which I think is what you want. I've
> been uncomfortable with was that split into "Statement-Based Replication
> Middleware" and "Synchronous Multi-Master Replication". I've been
> arguing that the first describes one possible implementation of the
> second, while other implementations are not described (2PC, SHMEM,
> Postgres-R, etc...)
>
> I was trying to say that I'm beginning to accept that split, because
> especially pgpool really seems to put a lot of those burdens to the
> user. I've been trying to use some humor, but that mainly seems to
> confuse people. My english might not be good enough for humor, yet.
>
> However, where do you now fit Sequoia in? It uses "statement-based
> replication", but AFAIK it is much more clever than pgpool and handles
> non-deterministic functions. And the Sequoia people probably won't get
> excited about not calling them "Multi-Master Replication".

Uh, good point. The title is now "Statement-Based Replication
Middleware". That doesn't say multi-master, but it doesn't say
master/slave either. The Sequoia PDF you sent me is very detailed:

http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf

I think we are back to the issue of classification. We have traditional
master/slave as slony, and multi-master as perhaps pgcluster, and lots
in between. I am thinking pgpool and sequoia fit in there. I have
added Sequoia to the Statement-Based Replication Middleware section.

> Bruce Momjian wrote:
> > I just saw it [the slides about PGCluster-II]. It does seem more like
> > Oracle RAC than any other method.
>
> Yes. I think it's not production ready, yet, so there's no point in
> mentioning it in the documentation.

OK.

> Bruce Momjian wrote:
> > I figured that shared-disk/memory only really makes sense for
> > multi-master clustering, so I mentioned it in that paragraph:
> >
> > ...<snipped the new paragraph>
> >
> > Is that enought?
>
> I'd say so, yes. We are not going into more details for other aspects so
> that's fine.

OK.

> You might not even mention shared-memory. I don't know of any
> implementation in the database world. Except perhaps using OpenMosix and
> running PostgreSQL on top of it. Maybe just leave it in there, it won't
> hurt.

OK, I will only mention shared disk now.

> Bruce Momjian wrote:
> > One problem I have is that we we have shared disk failover, but no
> > other shared case with a PostgreSQL implementation, and people don't
> > want to mention Oracle RAC, so why do we mention it if we have no
> > implementations even in the works.
>
> Most probably you're already aware that with PGCluster-II we have such
> an implementation in the works.

I do now. :-) I think we are OK with the additional sentence about
shared disk in the Synchonous Multi-Master Replication section, right?

> *** Asynchronous Multi-Master Replication ***
>
> >> Again, IMHO, "Parallel Query Execution" says everything. The word
> >> 'Clustering' does not help, because it's not defined nor commonly
> >> used in any helpful way (probably besides marketing).
> >
> > OK, new title is Multi-Server Parallel Query Execution. If I have
> > just "Parallel Query Execution", it could be multi-process parallel
> > query execution.
>
> Yes, the new title is good.
>
> In the text below, you are mainly describing what I call 'disconnected
> operation' (somebody have a better, more common term for that?). But the
> main advantage of async replication is having no delay before commit.
> Thus giving better performance for writing transactions.
>
> In case of async, multi master replication, conflicts can arise, which
> have to be resolved. I think your example does not make it clear that
> this applies to async, multi master replication in general. And that
> those can sometimes be resolved automatically.

OK, good point, section updated:

<term>Asynchronous Multi-Master Replication</term>
<listitem>

<para>
For servers that are not regularly connected, like laptops or
remote servers, keeping data consistent among servers is a
challenge. Using asynchronous multi-master replication, each
server works independently, and periodically communicates with
the other servers to identify conflicting transactions. The
conflicts can be resolved by users or conflict resolution rules.
rules.

>
>
> *** Multi-Master Parallel Query Execution ***
>
> Bruce Momjian wrote:
> > Uh, multi-master replication allows for load balancing, but it doesn't
> > help a single query to run any faster. Think of having only one query
> > running on the cluster. Parallel execution allows a single query to
> > use more than one computer, right?
>
> Right.
>
> > Uh, this confuses me. What is missing? You split tables across
> > multiple servers.
>
> In "Multi-Master Parallel Query Execution" you write: "One possible way
> this could work is for the data to be split among servers". So the
> example you give involves Data Partitioning.

OK.

> I wanted to point out that another way to do Parallel Query Execution is
> using Multi-Master Replication to have equal replicas and then query
> them in parallel. I don't think there is any solution for that, yet.
> Except, perhaps PGPool-II can do it?

Uh, if the data isn't partitioned, what value is there to hitting
multiple servers, for single query? I am confused.

> *** Introduction Text on the top ***
>
> Bruce Momjian wrote:
> > OK, updated to add "little" delay, and removed "small" from async
> > case:
> >
> > load-balanced servers will return consistent results with little
> > propagation delay. Asynchronous updating has a delay between the
>
> Hm, that does not address my concerns. But after thinking about it, I
> can accept the term 'consistent results' - it's clear enough what it
> means. I'm probably thinking into too many details...

OK.

> But now, the "little delays" certainly is in the wrong place. Such
> delays occur before commit, not before returning results.

Uh, I don't think the little appears to talk about the results but only
the propogation.

> Maybe revert it back to "..no propagation delay". Or completely leave
> away the "no propagation delay".

OK, how is this new text?

This guarantees that a failover will not lose any data and that
all load-balanced servers will return consistent results no matter
which server is queried.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Markus Schiltknecht 2006-11-22 18:03:44 Re: Replication Docs
Previous Message Markus Schiltknecht 2006-11-22 10:54:00 Re: [Sequoia] PostgreSQL Documentation of High Availability and Load