Re: PostgreSQL Replication

From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: PostgreSQL Replication
Date: 2006-07-27 18:52:03
Message-ID: 1154026323.660183.228610@h48g2000cwc.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

"roy simkes" wrote:
> Hi,
>
> I'm curious about how this replication thing works. I have read some articles and most
> of them was about Master-Multi Slave relation. Perhaps that's the primary use of
> replication, make everything in one server and see it's effects in every server. Also it's
> a performance optimization too. But, anyway, what about a Multi Master relation! I read
> that MySQL was giving a solution as MySQL Cluster but it's only recommended for LAN.

There are a number of other issues with MySQL's replication system such
that I'd strongly recommend that you research it in depth before going
ahead with it in a production environment.

> And If you have a LAN there you wont need a multi master replication system a lot if you
> ask me (of course I understand that this is done to backup data, and also if one server
> goes down the other keep going but that's not the point and not my main question.)

Hardware dies, disks fail, and Bad Things Happen. Furthermore, failing
over to a DR site is quite a bit more drastic than simply failing over
to a local replicated database. There are also query-offloading / load
balancing options as you noted above.

> What about I intend to make a system for WAN which are connected not via fiber optic
> cables but via internet! Is it recommended to use such a replication system. Because the
> replication will be very asynchronous (if the internet connection is low and if it isn't very
> stable) I guess.

Synchronous vs. async is a function of the replication tool you are
using. High latency (from a long distance internet connection for
example) for a synchronous system will have a very substantial impact
on the performance of replicated data. One of the main reasons
asynchronous systems exist is specifically to avoid that issue.

> A little scenario to explain what I meant more clearly. Let's say we have
> two computers which one is at USA and other in UK. They are both master and both slave
> according to the data written. If you enter a data to UK, USA becomes the slave and
> replicate that data. And the same thing vice versa. Is it possible with PostgreSQL to build
> such a system.

Not currently. SlonyM may solve this problem when (and if) it gets
built.

An alternative is to design your application such that you have two
tables, one of which is master in the USA and only accepts DML there,
and the other master in the UK. Then you pick one side to do
reconcilliation (for example: UK propegates data from a slave copy of
the USA table into the UK table). If you go this route, then you can
use slony1 to handle the table replication part of this solution. Note
that this means your application has to deal with the reconcilliation
problem. Depending on how your data works, this can be a very difficult
problem.

> Will it be stable and secure.

Slony1 is quite stable and deployed in a number of fairly critical
roles. The security is no better and no worse than the combination of
postgres' native security plus what ever you use to secure the
connection betwee the two sites. Common practice is to use a VPN, but
I'm not a network guy.

> How much data loss can be guessed (but yes even 1bit can't be acceptable in such a case
> but I don't think this can occur as the data still exists in the other server.)

I think you're asking about data integrity here. Slony1 provides
sequential consistency. In other words, it's asynchronous, so the data
in a replica will typically be behind the data in the origin. When the
changes to the data are propegated to the replica, Slony applies them
in the order that they were applied on the origin. This means that
while the data may not be perfectly up-to-date, it will at least always
be consistent.

If you don't need quite so strong a guarantee of integrity, then you
might want to check out pgcluster or pgpool. These are statement based
replication solutions, which may not provide quite as much consistency.
My experience with these tools is very limited.

> What are possible problems that can occur.

That's quite a list, if you're just asking about replication in
general. I suggest you refer to the various academic papers on the
subject. If you want to know about a specific replication solution,
then please refer to the documentation and FAQ for that software and
follow up as appropriate.

> And also what about other databases (MSSQL,Oracle,DB2,MySQL)

I am not aware of a WAN capable synchronous multi-master replication
solution for any database system.

> It's not very easy to answer this question I think and maybe it does not fit to the novice
> section, but I dunno much about replication so I fit to this mailing group :)

Co-incidentally, there's talk about a statement on the subject of
replication from the postgres team. You can find discussion on the
topic in the pgsql-hackers mailing list.

Drew

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Andrew Hammond 2006-07-27 19:06:23 Re: query help
Previous Message Andrew Hammond 2006-07-27 18:12:03 Re: Group By, Aggregate Functions and NULL