Skip site navigation (1) Skip section navigation (2)

Re: Replication Docs

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: pgsql-docs(at)postgresql(dot)org
Subject: Re: Replication Docs
Date: 2006-11-22 17:36:54
Message-ID: 200611221736.kAMHasi00788@momjian.us (view raw or flat)
Thread:
Lists: pgsql-docs
Markus Schiltknecht wrote:
> Hello Bruce,
> 
> I was trying to put together all comments to specific sections, thus the 
> new thread. Hope that helps.
> 
> *** Synchronous Multi-Master Replication ***
> 
> Bruce Momjian wrote:
>  > OK, new title is "Synchonous Multi-Master Replication", and the next
>  > heading is "Asynchronous Multi-Master Replication".
> 
> Good, I really like that one. :-)

Great (until we change it again)  ;-)

>  >> Why not simply call in "Multi Master Replication"? That implies
>  >> clustering, doesn't it?
>  >
>  > Well, not really because of the async multi-master that is the next
>  > item.
> 
> Yes, it's fine that way. I was just unsure if you want to have sync and 
> async in one paragraph or not. The proposal "Multi Master Replication" 
> would only fit if we'd describe both in one paragraph. I like to 
> describe both in more detail, as you did now.

OK, it is two separate entries now:

	http://momjian.us/main/writings/pgsql/sgml/high-availability.html

>  >> BTW, I'm slowly beginning to accept that you don't want to mix
>  >> "Statement-Based Replication Middleware" with "Multi Master
>  >> Replication". ;-)
>  >
>  > OK, are they mixed now?
> 
> No, they're not. They're split, which I think is what you want. I've 
> been uncomfortable with was that split into "Statement-Based Replication 
> Middleware" and "Synchronous Multi-Master Replication". I've been 
> arguing that the first describes one possible implementation of the 
> second, while other implementations are not described (2PC, SHMEM, 
> Postgres-R, etc...)
> 
> I was trying to say that I'm beginning to accept that split, because 
> especially pgpool really seems to put a lot of those burdens to the 
> user. I've been trying to use some humor, but that mainly seems to 
> confuse people. My english might not be good enough for humor, yet.
> 
> However, where do you now fit Sequoia in? It uses "statement-based 
> replication", but AFAIK it is much more clever than pgpool and handles 
> non-deterministic functions. And the Sequoia people probably won't get 
> excited about not calling them "Multi-Master Replication".

Uh, good point.  The title is now "Statement-Based Replication
Middleware".  That doesn't say multi-master, but it doesn't say
master/slave either.  The Sequoia PDF you sent me is very detailed:

  http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf

I think we are back to the issue of classification.  We have traditional
master/slave as slony, and multi-master as perhaps pgcluster, and lots
in between.  I am thinking pgpool and sequoia fit in there.  I have
added Sequoia to the Statement-Based Replication Middleware section.

> Bruce Momjian wrote:
>  > I just saw it [the slides about PGCluster-II].  It does seem more like
>  > Oracle RAC than any other method.
> 
> Yes. I think it's not production ready, yet, so there's no point in 
> mentioning it in the documentation.

OK.

> Bruce Momjian wrote:
>  > I figured that shared-disk/memory only really makes sense for
>  > multi-master clustering, so I mentioned it in that paragraph:
>  >
>  > ...<snipped the new paragraph>
>  >
>  > Is that enought?
> 
> I'd say so, yes. We are not going into more details for other aspects so 
> that's fine.

OK.

> You might not even mention shared-memory. I don't know of any 
> implementation in the database world. Except perhaps using OpenMosix and 
> running PostgreSQL on top of it. Maybe just leave it in there, it won't 
> hurt.

OK, I will only mention shared disk now.

> Bruce Momjian wrote:
>  > One problem I have is that we we have shared disk failover, but no
>  > other shared case with a PostgreSQL implementation, and people don't
>  > want to mention Oracle RAC, so why do we mention it if we have no
>  > implementations even in the works.
> 
> Most probably you're already aware that with PGCluster-II we have such 
> an implementation in the works.

I do now.  :-)  I think we are OK with the additional sentence about
shared disk in the Synchonous Multi-Master Replication section, right?

> *** Asynchronous Multi-Master Replication ***
> 
>  >> Again, IMHO, "Parallel Query Execution" says everything. The word
>  >> 'Clustering' does not help, because it's not defined nor commonly
>  >> used in any helpful way (probably besides marketing).
>  >
>  > OK, new title is Multi-Server Parallel Query Execution.  If I have
>  > just "Parallel Query Execution", it could be multi-process parallel
>  > query execution.
> 
> Yes, the new title is good.
> 
> In the text below, you are mainly describing what I call 'disconnected 
> operation' (somebody have a better, more common term for that?). But the 
> main advantage of async replication is having no delay before commit. 
> Thus giving better performance for writing transactions.
> 
> In case of async, multi master replication, conflicts can arise, which
> have to be resolved. I think your example does not make it clear that 
> this applies to async, multi master replication in general. And that 
> those can sometimes be resolved automatically.

OK, good point, section updated:

	  <term>Asynchronous Multi-Master Replication</term>
	  <listitem>
	
	   <para>
	    For servers that are not regularly connected, like laptops or
	    remote servers, keeping data consistent among servers is a
	    challenge.  Using asynchronous multi-master replication, each
	    server works independently, and periodically communicates with
	    the other servers to identify conflicting transactions.  The
	    conflicts can be resolved by users or conflict resolution rules.
	    rules.

> 
> 
> *** Multi-Master Parallel Query Execution ***
> 
> Bruce Momjian wrote:
>  > Uh, multi-master replication allows for load balancing, but it doesn't
>  > help a single query to run any faster.  Think of having only one query
>  > running on the cluster.  Parallel execution allows a single query to
>  > use more than one computer, right?
> 
> Right.
> 
>  > Uh, this confuses me.  What is missing?  You split tables across
>  > multiple servers.
> 
> In "Multi-Master Parallel Query Execution" you write: "One possible way 
> this could work is for the data to be split among servers". So the 
> example you give involves Data Partitioning.

OK.

> I wanted to point out that another way to do Parallel Query Execution is 
> using Multi-Master Replication to have equal replicas and then query 
> them in parallel. I don't think there is any solution for that, yet. 
> Except, perhaps PGPool-II can do it?

Uh, if the data isn't partitioned, what value is there to hitting
multiple servers, for single query?  I am confused.

> *** Introduction Text on the top ***
> 
> Bruce Momjian wrote:
>  > OK, updated to add "little" delay, and removed "small" from async
>  > case:
>  >
>  >   load-balanced servers will return consistent results with little
>  >   propagation delay. Asynchronous updating has a delay between the
> 
> Hm, that does not address my concerns. But after thinking about it, I 
> can accept the term 'consistent results' - it's clear enough what it 
> means. I'm probably thinking into too many details...

OK.

> But now, the "little delays" certainly is in the wrong place. Such 
> delays occur before commit, not before returning results.

Uh, I don't think the little appears to talk about the results but only
the propogation.

> Maybe revert it back to "..no propagation delay". Or completely leave 
> away the "no propagation delay".

OK, how is this new text?

  This guarantees that a failover will not lose any data and that
  all load-balanced servers will return consistent results no matter
  which server is queried.

-- 
  Bruce Momjian   bruce(at)momjian(dot)us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

pgsql-docs by date

Next:From: Markus SchiltknechtDate: 2006-11-22 18:03:44
Subject: Re: Replication Docs
Previous:From: Markus SchiltknechtDate: 2006-11-22 10:54:00
Subject: Re: [Sequoia] PostgreSQL Documentation of High Availability and Load

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group