Re: Use of rsync for data directory copying

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: David Kerr <dmk(at)mr-paradox(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Use of rsync for data directory copying
Date: 2012-07-15 02:57:22
Message-ID: 20120715025722.GA3215@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 14, 2012 at 09:17:22PM -0400, Stephen Frost wrote:
> Bruce,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > If two writes happens in the middle of a file in the same second, it
> > seems one might be missed. Yes, I suppose the WAL does fix that during
> > replay, though if both servers were shut down cleanly, WAL would not be
> > replayed.
> >
> > If you using it for a hot backup, and WAL would clean that up.
>
> Right... If it's hot backup, then WAL will fix it; if it's done after a
> clean shut-down, nothing should be writing to those files (much less
> multiple writes in the same second), so checksum shouldn't be
> necessary...
>
> If you're doing rsync w/o doing pg_start_backup/pg_stop_backup, that's
> not likely to work even *with* --checksum..
>
> So, can you explain which case you're specifically worried about?

OK. The basic problem is that I previously was not clear about how
reliant our use of rsync (without --checksum) was on the presence of WAL
replay.

Here is an example from our documentation that doesn't have WAL replay:

http://www.postgresql.org/docs/9.2/static/backup-file.html

Another option is to use rsync to perform a file system backup. This is
done by first running rsync while the database server is running, then
shutting down the database server just long enough to do a second rsync.
The second rsync will be much quicker than the first, because it has
relatively little data to transfer, and the end result will be
consistent because the server was down. This method allows a file system
backup to be performed with minimal downtime.

Now, if a write happens in both the first and second half of a second,
and only the first write is seen by the first rsync, I don't think the
second rsync will see the write, and hence the backup will be
inconsistent.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-07-15 05:02:01 Re: Use of rsync for data directory copying
Previous Message Joel Jacobson 2012-07-15 02:56:29 Re: Schema version management