Re: Using RSYNC for replication?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jason Hihn <jhihn1(at)umbc(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Using RSYNC for replication?
Date: 2003-01-28 14:58:49
Message-ID: 13051.1043765929@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jason Hihn <jhihn1(at)umbc(dot)edu> writes:
> A sequence of events ocurred to me today that left me wondering if I can
> rsync the raw files as a form of replication.

In general, you can't. There are very precise synchronization
requirements among the files making up the data directory, and there's
no way that a separate process like tar or rsync is going to capture a
consistent snapshot of all the files.

As an example: one of the recent reports of duplicate rows (in a table
with a unique index) seems to have arisen because someone tried to take
a tar dump of $PGDATA while the postmaster was running. When he
restored the tar, two different versions of a recently-updated row both
looked to be valid, because the table's data file was out of sync with
pg_clog.

If you had a dump utility that was aware of the synchronization
requirements, it *might* be possible to dump the files in an order that
would work reliably (I'm not totally sure about it, but certainly data
files before WAL would be one essential part of the rules). But out-of-
the-box tar or rsync won't get it right.

> I'd like to keep postmaster running, but flush and lock everything,
> then perform the copy via rsync so only the new data is propigated,
> all while postmaster is running.
> In general, data is only added to a few tables in the database, with
> updates occuring infrequently to the rest. Rarely are deletes ever done.
> During the sync neither DB will change except as part of the rsync.

If you checkpoint before the rsync, and guarantee that no updates occur
between that and the conclusion of the rsync, and *take down the
destination postmaster* while it runs, then it might possibly work.
But I'd never trust it. I'd also kinda wonder what's the point, if you
have to prevent updates; you might as well shut down the postmaster and
avoid the risk of problems.

A final note is that I doubt this would be very efficient: wouldn't
rsync have to ship entire table files (and entire WAL log files) for
even the most piddling change?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2003-01-28 15:15:09 Re: Indexing foreign keys
Previous Message Mark Cave-Ayland 2003-01-28 14:25:32 Ref to last INSERT on a table without OIDs?