Re: [Sequoia] PostgreSQL Documentation of High Availability and Load

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Emmanuel Cecchet <emmanuel(dot)cecchet(at)continuent(dot)com>
Cc: pgsql-docs(at)postgresql(dot)org, Sequoia general mailing list <sequoia(at)lists(dot)forge(dot)continuent(dot)org>, pgpool-hackers(at)pgfoundry(dot)org, pgcluster-general(at)pgfoundry(dot)org, community(at)gorda(dot)di(dot)uminho(dot)pt
Subject: Re: [Sequoia] PostgreSQL Documentation of High Availability and Load
Date: 2006-11-21 10:08:19
Message-ID: 4562D013.2010504@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Hello Emmanuel,

Emmanuel Cecchet wrote:
> I just quickly went through the classification.
> I don't think that your description of 'multi-master replication using
> clustering' is correct. Oracle RAC is a shared disk approach and just
> send cache invalidations to other nodes but not actual data. As the disk
> is shared, data is only commited once to disk and there is a distributed
> locking protocol to make nodes agree on a serializable transactional order.

I agree with that. AFAI understand, we are not entirely sure how much of
that we want to cover in our documentation.

> You can have a look at the last ApacheCon presentation I gave
> (http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf),

thank you for that link, I'll read through it ASAP.

> there is a comparison of most solutions. I think that it would be cool
> to have a comparison matrix with the different features that you may
> need in an HA solution and see how the different solutions can provide
> an answer to these problems.

We've already come to the consensus, that we don't want to list all
products in our documentation, but better do that on the website, where
it can easily be changed at any time. The documentation should explain
fundamentals.

> What you describe as 'Statement-Based Replication Middleware' which is
> in fact multi-master replication at the middleware level, is usually
> more than just a simple proxy that broadcasts all queries. It must
> provide 1-copy-serializability if you want the cluster to be consistent.

Yes, the current description is very much targeted at pgpool. Thank you
very much for your input, that helps to clarify things.

> This means that write queries must be sent in the same serializable
> order to every node. As SQL is interpreted, macros can be replaced on
> the fly by the middleware with cluster-wide values. About sequences,
> calls to sequences must be broadcast as if it were writes. This is
> needed to update sequences the same way on all nodes. Note that
> sequences are parts of unrollbackable changes of the database. This
> means that even if a transaction rollbacks at one node, its sequence
> will not be rollbacked meaning that all other nodes must also play
> rollbacked transactions (at runtime or recovery time) if they were
> accessing sequences or involving other operations that the database
> cannot rollback.
> From a performance standpoint, the middleware approach can balance
> queries that come from the same client connection to different nodes
> which is not the case when you are directly connected to a database
> instance. Also failover can be made fully transparent at the middleware
> level (any node failure can be completely hidden to the client). With
> Sequoia, we even hide middleware failures by providing transparent
> failover code in our driver.
> The discussion should add a section describing real HA features (not
> just load balancing):
> - Do you lose data on failure?
> - Will my transactions fail on a node failure?
> - Is failover transparent? Does it need special support in the client
> application?
> - Is failback a manual process?
> - How long does it take to restart/resynchronize a node?
> - Can I add nodes on the fly to the cluster?
> - Can I do maintenance operations without stopping the cluster?
> - Can I upgrade the cluster without interruptions (e.g. migrate from
> PostgreSQL 7.4 to 8.0 without interruption) ?
> - WAN support? How network partitions are handled?

These are good questions to analyze a certain solution. As far as our
documentation is concerned, I think giving rough estimates for
categories of replication algorithms is sufficient (i.e. stating that
Multi Master Replication scales very good for reading transactions, but
not very well for writing ones).

> When configured with RAIDb-1,

I know RAID-1, but what's a RAIBd-1?

> Sequoia provide a multi-master (full)
> replication at the middleware level. Note that Sequoia also allows for
> partial replication or data partitioning (but the granularity is the
> table meaning that you can only distribute tables on different nodes but
> not slice a table and split it on different nodes as you describe in
> Data Partitioning).
>
> Thanks again for this great work and I hope this will help improve the
> documentation.

Sure, thank you very much for your input. I'm glad we have the
possibility to cover sequoia, too.

> Note that there is also Postgres-R that is another approach to
> middleware-based replication.

Middleware? Mrs. Kemme also did some research with middleware
replication, but I wouldn't exactly call Postgres-R a middleware-based
replication solution. I've ported Postgres-R to PostgreSQL 8.2 and can
assure you that it's very well embedded into the backend of PostgreSQL.
Probably too well for some ;-) You might want to check out my (slightly
outdated) website www.postgres-r.org.

Regards

Markus

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Markus Schiltknecht 2006-11-21 11:35:45 Re: PostgreSQL Documentation of High Availability and Load Balancing
Previous Message Markus Schiltknecht 2006-11-21 09:42:11 Re: "Clustering"