Re: High Availability, Load Balancing, and Replication Feature Matrix

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: PostgreSQL-documentation <pgsql-docs(at)postgresql(dot)org>
Subject: Re: High Availability, Load Balancing, and Replication Feature Matrix
Date: 2007-11-11 14:52:57
Message-ID: 200711111452.lABEqv023393@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Markus Schiltknecht wrote:
> Hello Bruce,
>
> thank you for your detailed answer.
>
> Bruce Momjian wrote:
> > Not sure if you were around when we wrote this chapter but there was a
> > lot of good discussion to get it to where it is now.
>
> Uh.. IIRC quite a good part of the discussion for chapter 23 was between
> you and me, pretty exactly a year ago. Or what discussion are you
> referring to?

Sorry, I forgot who was involved in that discussion.

> >> First of all, I don't quite like the negated formulations. I can see
> >> that you want a dot to mark a positive feature, but I find it hard to
> >> understand.
> >
> > Well, the idea is to say "what things do I want and what offers it?" If
> > you have positive/negative it makes it harder to do that. I realize it
> > is confusing in a different way. We could split out the negatives into
> > a different table but that seems worse.
>
> Hm.. yeah, I can understand that. As those are thing the user wants, I
> think we could formulate positive wishes. Just a proposal:
>
> No special hardware required: works with commodity hardware
>
> No conflict resolution necessary: maintains durability property
>
> master failure will never lose data: maintains durability
> on single node failure
>
> With the other two I'm unsure.. I see it's very hard to find helpful
> positive formulations...

Yea, that's where I got stuck --- that the positives were harder to
understand.

> >> I'm especially puzzled about is the "master never locks others". All
> >> first four, namely "shared disk failover", "file system replication",
> >> "warm standby" and "master slave replication", block others (the slaves)
> >> completely, which is about the worst kind of lock.
> >
> > That item assumes you have slaves that are trying to do work.
>
> Yes, replication in general assumes that. So does high availability,
> IMO. Having read-only slaves means nothing else but locking them from
> write access.
>
> > The point
> > is that multi-master slows down the other slaves in a way no other
> > option does,
>
> Uh.. you mean the other masters? But according to that statement, "async

Sorry, I meant that a master that is modifying data is slowed down by
other masters to an extent that doesn't happen in other cases (e.g. with
slaves). Is the current "No inter-server locking delay" OK?

> multi-master replication" as well as "statement-based replication
> middleware" should not have a dot, because those as well slow down other
> masters. In the async case at different points in time, yes, but all
> master have to write the data, which slows them down.

Yea, that is why I have the new text about locking.

> I'm suspecting you are rather talking about the network dependent commit
> latency of eager replication solutions. I find the term "locking delay"
> for that rather confusing. How about: "normal commit latency"? (Normal,
> as in: depends on the storage system used, instead of on the network and
> storage).

Uh, I assume that multi-master locking happens often before the commit.

> > which is the reason we don't support it yet.
>
> Uhm.. PgCluster *is* a synchronous multi-master replication solution. It
> also is a middleware and it does statement based replication. Which dots
> of the matrix do you think apply for it?

I don't consider PgCluster middleware because the servers have to
cooperate with the middleware. And I am told it is much slower for
writes than a single server which supports my "locking" item, though it
is more "waiting for other masters" that is the delay, I think.

> >> Comparing between "File System Replication" and "Shared Disk Failover",
> >> you state that the former has "master server overhead", while the later
> >> doesn't. Seen solely from the single server node, this might be true.
> >> But summarized over the cluster, you have a network with a quite similar
> >> load in both cases. I wouldn't say one has less overhead than the other
> >> per definition.
> >
> > The point is that file system replication has to wait for the standby
> > server to write the blocks, while disk failover does not.
>
> In "disk failover", the master has to wait for the NAS to write the
> blocks on mirrored disks, while in "file system replication" the master
> has to wait for multiple nodes to write the blocks. As the nodes of a
> replicated file system can write in parallel, very much like a RAID-1
> NAS, I don't see that much of a difference there.

I don't assume the disk failover has mirrored disks. It can just like a
single server can, but it isn't part of the backend process, and I
assume a RAID card that has RAM that can cache writes. In the file
system replication case the server is having to send commands to the
mirror and wait for completion.

> > I don't think
> > the network is an issue considering many use NAS anyway.
>
> I think you are comparing an enterprise NAS to a low-cost, commodity
> hardware clustered filesystem. Take the same amount of money and the
> same number of mirrors and you'll get comparable performance.

Agreed. In the one case you are relying on another server, and in the
NAS case you are relying on a black box server. I think the big
difference is that the other server is a separate entity, while the NAS
is a shared item.

> > There is no dot there so I am saying "statement based replication
> > solution" requires conflict resolution. Agreed you could do it without
> > conflict resolution and it is kind of independent. How should we deal
> > with this?
>
> Maybe a third state: 'n/a'?

Good idea, or "~". How would middleware avoid conflicts, i.e. how would
it know that two incoming queries were in conflict?

> >> And in the special case of (async, but eager) Postgres-R also to "async
> >> multi-master replication" and "no conflict resolution necessary".
> >> Although I can understand that that's a pretty nifty difference.
> >
> > Yea, the table isn't going to be 100% but tries to summarize what in the
> > section above.
>
> That's fine.
>
> > [...]
> >
> > Right, but the point of the chart is go give people guidance, not to
> > give them details; that is in the part above.
>
> Well, sure. But then we are back at the discussion of the parts above,
> which is quite fuzzy, IMO. I'm still missing those details. And I'm
> dubious about it being a basis for a feature matrix with clear dots or
> no dots. For the reasons explained above.
>
> >> IMO, "data partitioning" is entirely perpendicular to replication. It
> >> can be combined, in various ways. There's horizontal and vertical
> >> partitioning, eager/lazy and single-/multi-master replication. I guess
> >> we could find a use case for most of the combinations thereof. (Kudos
> >> for finding a combination which definitely has no use case).
> >
> > Really? Are you saying the office example is useless? What is a good
> > use case for this?
>
> Uhm, no sorry, I was unclear here. And not even correct. I was trying to
> say that there's a use case for each and every combination of the three
> properties above.

OK.

> I'm now revoking one: "master-slave" combines very badly with "eager
> replication". Because if you do eager replication, you can as well have
> multiple masters without any additional cost. So, only these three

Right. I was trying to hit typical usages.

> combinations make sense:
>
> - lazy, master-slave
> - eager, master-slave
> - eager, multi-master

Yep.

> Now, no partitioning, horizontal as well as vertical partitioning can be
> combined with any of the above replication method. Giving a total of
> nine combinations, which all make perfect sense for certain applications.
>
> If I understand correctly, your office example is about horizontal data
> partitioning, with lazy, master-slave replication for the read-only copy
> of the remote data. It makes perfect sense.

I did move it below and removed it from the chart because as you say how
to replicate to the slaves is an independent issue.

> With regard to replication, there's another feature I think would be
> worth mentioning: dynamic addition or removal of nodes (masters or
> slaves). But that's solely implementation dependent, so it probably
> doesn't fit into the matrix.

Yea, I had that but found you could add/remove slaves easily in most
cases.

> Another interesting property I'm missing is the existence of single
> points of failures.

Ah, yea, but then you get into power and fire issues.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2007-11-11 14:55:05 Re: Placement of contrib modules in SGML documentation
Previous Message Bruce Momjian 2007-11-11 14:28:24 Re: [PATCHES] Contrib docs v1

Browse pgsql-hackers by date

  From Date Subject
Next Message Diego Pires Plentz 2007-11-11 15:04:51 [hibernate-team] PostgreSQLDialect
Previous Message Radoslaw Zielinski 2007-11-11 13:45:39 Beta2 "horology" test failure on non-US zoneinfo/posixrules