Re: replication/redundancy

From: Jonathan Gardner <jgardner(at)jonathangardner(dot)net>
To: weigelt(at)metux(dot)de, pgsql-admin(at)postgresql(dot)org
Subject: Re: replication/redundancy
Date: 2003-07-01 15:22:37
Message-ID: 200307010822.39288.jgardner@jonathangardner.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 30 June 2003 09:17, weigelt(at)metux(dot)de wrote:
> On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote:
>
> * currently only an explicit sync-out is supported - from time to time
> evry table has to be scanned for new records

So you are using "lazy" rather than "eager" replication. I am sure you know
the limitations for lazy replication. Let me enumerate them here for those of
you who aren't familiar with this:

1) The data is not consistent. This means if you run the same select query
at the same time on the two databases, you may get different results. For
some situations, that is okay (like Usenet). For others, it is not. (like
registrations -- you'll sign up on one database, but you won't appear on the
other.)

2) The "other" process that does the synchronization is serial in nature.
The processes that change the database are parallel in nature. It is very
possible to have changes happening to the database faster than you can
replicate them. This was a real problem at a web company I recently worked
for that used lazy replication. Their backup database fell weeks behind the
live database. It almost got to the point where recreating the entire
database would've been faster than waiting for the replication process to
catch up.

3) These two factors above make using the second database as a hot-swappable
backup risky at best. You will lose some data when you switch to the backup,
unless changes to the database are so rare that the backup is usually up to
date. If that were the case, you probably don't need the backup in the first
place, because databases that don't do much tend not to be very important.

>
> * currently no real conflict handling
>

What he is talking about here is what happens when two seperate processes are
working on the same rows. PostgreSQL uses transactions and locking right now,
so two processes on the same system cannot do this. However, his system
cannot handle this at all when the two processes are on seperate machines.

The most obvious problem with this comes from incrementing a column. If both
processes try to increment the same column, then they will end up with the
column incremented by one or the other, but not both. This would be bad for
things like paypal, where your account would only increase by one or the
other account transfers, rather than both, if two occured at the same time.

>
> perhaps we can improve this a little bit.
>

I would hope you spend some time researching what others have done. Relational
databases are an area that a tremendous amount of solid research has already
occured. Applying yourself to understand the research and projects that have
gone before you will save yourself a lot of time replicating their work. In
other words, "If I have seen farther, it is because I have stood on the
shoulders of giants" to (mis?)quote Newton.

Again, to re-emphasize why pgreplication is so cool and why everyone should be
excited about this:
1) Database theory says that scaleable, eager replication is impossible.
This is true in practice.
2) The Postgres-R team discovered a way to make scaleable, eager replication
work. The restriction is that locks, once granted, may be aborted or revoked.
3) This means you will one day be able to setup a beowulf-type cluster of
postgres databases that will rival the most powerful databases on earth
today.

- --
Jonathan Gardner <jgardner(at)jonathangardner(dot)net>
(was jgardn(at)alumni(dot)washington(dot)edu)
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/Aac+WgwF3QvpWNwRAgFxAJ9Mxesnc6Q3wLrUcL1Zz62AGLLjGACcCYJp
zcV9rFm8TiqH90N6eSpRQnY=
=/bFm
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jonathan Gardner 2003-07-01 15:23:41 Re: postgre Databases
Previous Message Tom Lane 2003-07-01 14:36:00 Re: big tables with lots-o-rows