Re: Server move using rsync

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Stephen Denne <stephen(dot)denne(at)datam(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Server move using rsync
Date: 2011-10-28 05:14:43
Message-ID: CAOR=d=3iEKSxzbPRGpYH8xzevCzJovH4PBG1m6T6BA2fMKt=_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 27, 2011 at 7:37 PM, Stephen Denne
<stephen(dot)denne(at)datam(dot)co(dot)nz> wrote:
> We're intending to move a 470GB PostgreSQL 8.3.13 database using the following technique from http://www.postgresql.org/docs/8.3/interactive/backup-file.html
>
> "Another option is to use rsync to perform a file system backup. This is done by first running rsync while the database server is running, then shutting down the database server just long enough to do a second rsync. The second rsync will be much quicker than the first, because it has relatively little data to transfer, and the end result will be consistent because the server was down. This method allows a file system backup to be performed with minimal downtime."
>
> Except that we plan on an initial rsync which we think might take a couple of days, then subsequent daily rsyncs for up to a week to keep it up to date till we stop the old database, rsync again, and start the new database.

Sounds reasonable. don't forget the --delete switch or the
destination will just keep growing and growing.

> A very rough approximation of our database would be half a dozen large tables taking up 1/3 of the disk space, and lots of indexes on those tables taking the other 2/3 of the space.
>
> If we assume usage characteristics of:
> Much less than 1% of indexed data changing per day, with almost all of those updates being within the 1% of most recently added data.
> Much less than 1% of historical indexed data being deleted per day with most of the deletions expected to affect sets of contiguous file pages.
> About 1% of new indexed data added per day
>
> I'm curious of the impact of vacuum (automatic and manual) during that process on expected amount of work rsync will have to do, and time it will take, and on what the update pattern is on files of Btree indexes.
>
> Is it worth making sure vacuum is not run, in order to reduce the amount of files that change during that period?

Probably not. You can test that theory by turning off vacuum for a
day to see how much of a change it makes. My semi-educated
scientific wild-assed guess is it won't make any difference, since the
file / block will be changed with or without the vacuum, and still
have to be copied.,

> Is there any way during that week, that we can verify whether our partially completed database move process is going to result in a database that starts up ok?

Try starting it up?

In general, the lower the traffic when you rsync the better the
chances, but honestly if you're not stopping the database then you
shouldn't count on luck to make it work. Note that you CAN do the
whole rsync followed by setting up PITR to get a coherent database
backup that is guaranteed to start up, assuming you've followed all
the instructions on how to set up PITR properly.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message kucoj 2011-10-28 06:49:50 Re: Unable to write inside TEMP environment variable path
Previous Message Satoshi Nagayasu 2011-10-28 04:51:58 Re: pglesslog for Postgres 9.1.1