Re: Consistent file-level backup of pg data directory

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: gator_ml(at)yahoo(dot)de
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Consistent file-level backup of pg data directory
Date: 2014-01-08 18:55:25
Message-ID: CAMkU=1yTn1NK+AZ_mBzRZcdmEMLQNjBZWgzfE0qYaDTu3-VfKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jan 8, 2014 at 7:09 AM, <gator_ml(at)yahoo(dot)de> wrote:

> On 2014-01-07 21:54, Jeff Janes wrote:
> > On Tue, Jan 7, 2014 at 10:58 AM, <gator_ml(at)yahoo(dot)de
> > <mailto:gator_ml(at)yahoo(dot)de>> wrote:
> > - keep a copy "backup_label" under a different name
> [...[
> > Why under a different name? That sounds dangerous to me.
> ... otherwise it would be useless, because "pg_stop_backup"
> will delete the "backup_label" file
>

That is one of the reasons you should run the backup before you tell
PostgreSQL that you finished the backup :)

> > - call pg_stop_backup()
> > - the run the regular backup procedure
> >
> >
> > You have this order backwards. You can only tell postgres that you are
> > done doing the backup once you are done doing the backup.
> this is what I tried 1st - unfortunately, this won't work at all,
> because postgres creates another checkpoint _after_ pg_stop and will
> refuse to run the restore procedure if this last wal file is missing
> (which in my scenario inevitably will be the case)
>

Eventually another checkpoint will be run, but pg_stop_backup will not
directly cause one to happen. What it creates is a record in the WAL
stream saying that the backup has finished. That record is what it needs
to see. After writing that record, it triggers a log switch (not the same
thing as a checkpoint) so that the xlog file containing the record will get
archived (if you are were truly using archiving).

That requirement is there to protect your data, you cannot safely
circumvent the need for it. If you have backed up the xlog file that
contains that record before the record was present, then that will be a
problem, as it won't find the record. The solution for that (other than
using archiving) is to backup everything before calling pg_stop_backup,
then backup just the pg_xlog after calling pg_stop_backup.

> > It is pretty much a disaster. Do you really need rsync? Otherwise, use
> > pg_basebackup, it is easier to get right.
>
> ... it looks like I didn't explain clear enough, what I am trying to
> accomplish. So let me try it again:
>
> We have a general backup procedure, that runs every night. For
> some machines, there may additionally be more
> specialized/fine-grained measures, but in any case, it should be
> possible to restore any given server without any machine-specific
> knowledge from the last nightly backup following a
> generic "restore-recipe". Our current rsync-based solution is a
> pretty good compromise between many different requirements.
>
> For machines running database systems, this means, this means,
> that I need some way to get a consistent state of some point in
> time. It does not particularly matter, which time exactly (in
> particular, I do not care, if transactions committed during the
> backup are included or not) For this purpose, I can define
> machine-specific hook to run before/ after the actual backup
> procedure.
>

I think it would be easier to just exclude the database from the
system-wide backup and use a different method for it, rather than engineer
the necessary before/after hooks onto the system-wide backup.

>
> Unfortunately, it does not look like there is any direct way to
> accomplish this with postgres except shutting down the whole
> database system while the backup is running. The systems will
> be almost idle while the backup runs, and write transactions
> will be very rare, but a total shutdown for every backup still
> would be too radical.
>
> Actually, with some tests copying the database directory while
> writing data to the database and then restarting postgres with
> such a copy, I did not manage to produce any visible
> inconsistencies. If postgres was able to restore the last
> consistent state in such a situation, then no additional measures
> would be needed (like I said, it is not particularly important
> which state this is - losing everything written during the last
> 15 minutes including committed transactions would be good enough)
> .But at least as far as I can tell from the documentation, this
> can not safely be assumed.
>

On small, nearly idle systems, you can often get away with doing a lot of
dangerous things. This is perhaps unfortunate, as it breeds bad habits.

> Otherwise, the procedures for "Continuous Archiving" are clearly
> made for some a diffent purpose, it seems like this is about as
> close as it gets to what I need. As far as I understand:
>
> - between pg_start_backup() and pg_stop_backup() it should be safe
> to copy the data directory any time
> - all committed transactions during this time will go to the wal files
> (actually, I noticed that files in other directories than pg_xlog are
> modified, too ...)
>

True, but the data in pg_xlog tells it how to replay those other changes to
those other directories. So it doesn't matter whether the other file was
backed up pre-modification or post-modification, as the WAL allows it to be
repaired either way. Unless the WAL stream ends too early--then it can't
repair them.

> - normally, it is intended to copy every single newly written wal file
> somewhere else by whatever is defined as "archive_command", but
> if this works, than any other way to copy all wal files written
> after pg_start_backup should work just as well.
>

Kind of. The archive_command conspires with database server to prevent it
from recycling a wal file until we know it has been successfully copied by
archive_command. It is difficult for "any other way" to make that
guarantee--although if the server is nearly idle, then the risk of such
recycling causing problems is rather low.

> My original idea was, to call pg_start_backup() 1st, then run the
> actual backup followed by pg_stop_backup. Because the procedure
> for " Point-in-Time Recovery" is the same and no wal file may be
> modified after it was written and "archive_command" called, it
> follows, that any wal file plus the preceding wal files written
> after pg_start_backup must contain everything that is needed to
> restore any state up to the time the wal file was
> written (otherwise the whole "PITR)" mechanism would not work).
>

PITR allows you to recover to any point after the backup successfully
ended. It does not allow you to recover to some point in the middle of the
backup (other than by starting at the previous successful backup and
rolling forward).

I think that this is a logical necessity. If you could use a backup set to
restore to a point in the middle of that same backup, then you wouldn't be
in the middle of the backup, you would be done with it.

> Unfortunately, postgres refuses to run the restore procedure if
> the last wal file (which is written after pg_stop_backup, so in
> my case it will not be available) even if "recovery_target_time"
> is explicitly set to some earlier time. I didn't try it, but
> assuming that the last wal file is not really needed in this
> case, it would be enough to just create an empty file with the
> name postgres is looking for?
>

It is really needed if you have high standards of data integrity and
safety. That is why PostgreSQL requires it.

>
> If I can't find, a better way, I probably could arrange to
> separately copy this last wal file into the
> backup (unfortunately, there is no easy way to do this ...)-:
>

You could just copy the entire pg_xlog directory again. It is easier,
probably safer (there is no guarantee that the "last" wal file is really
only one file, rather than two or three), and on a low-traffic server the
size is probably not all that large.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2014-01-08 19:24:02 Re: Last inserted row id with complex PK
Previous Message Nelson Green 2014-01-08 18:16:07 Re: Last inserted row id with complex PK