Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync

From: Wouter Verhelst <wouter(dot)verhelst(dot)ext(at)huawei(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync
Date: 2016-09-05 09:06:11
Message-ID: 2440DD2D6E0BA346934AFE42CB9080AB77BC0B15@lhreml503-mbx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

> Hi,
>
> A while back, I had to replace the hardware for a postgres server
> containing about 600G of data. While at it, we did an upgrade of the
> postgresql version used, and we had intended to also modify the way
> backups were being done, by moving from a daily pg_dumpall in cron to
> shipping WAL logs to an offsite location.
>
> Due to the many changes involved, I decided that it would be easier to
> copy the data by running "ssh oldserver pg_dumpall | psql postgres",
> but I did (approximately) the following:
>
> - Create the cluster
> - Set "wal_level = hot_standby", "archive_mode = on", and
> "archive_command = 'test ! -f /srv/pg_wal/%f && cp %p /srv/pg_wal/%f'
> in postgresql.conf.
> - Set up everything so that files written to /srv/pg_wal end up
> (eventually) on the remote server.
> - rsync the data directory to the off-site server
> - Start the server
> - Run the pg_dumpall thing
> - Start the server on the off-site location, to test if it could open
> everything, and stop it again
> - Create a file "recovery.conf" in the wrong location (at first) and
> then in the correct location (later on, after reading the documentation
> more carefully), with a line saying "restore_command = 'cp
> /srv/pg_wal/%f "%p"'" and one saying "standby_mode = on"
> - Start the server
>
> At this point, the log file shows the following message:
>
> 2016-08-29 13:53:22 CEST [25504-1] LOG: database system was shut down
> in recovery at (...)
> 2016-08-29 13:53:22 CEST [25504-2] LOG: entering standby mode
> 2016-08-29 13:53:22 CEST [25504-3] LOG: record with zero length at
> 0/2000200
> 2016-08-29 13:53:22 CEST [25504-4] LOG: invalid primary checkpoint
> record
> 2016-08-29 13:53:22 CEST [25504-5] LOG: record with zero length at
> 0/2000198
> 2016-08-29 13:53:22 CEST [25504-6] LOG: invalid secondary checkpoint
> record
> 2016-08-29 13:53:22 CEST [25504-7] PANIC: could not locate a valid
> checkpoint record
> 2016-08-29 13:53:22 CEST [25480-1] LOG: startup process (PID 25504)
> was terminated by signal 6: Aborted
> 2016-08-29 13:53:22 CEST [25480-2] LOG: aborting startup due to
> startup process failure
>
> In the pg_xlog directory on the off-site server, there is a file
> 000000010000000000000002 and one 000000010000000000000003. When I run
> md5sum, I find that the first has the same checksum as the one with the
> same name in the pg_wal directory containing the files from the
> original server; the second does not, but it *does* have the same
> checksum as the file 000000010000000000000001 in that directory (which
> does not exist in the pg_xlog directory on the off-site server).
>
> I have retained all WAL files from 000000010000000000000001 all the way
> to the current one.
>
> Is it still possible for me to load these WAL files into the server at
> the remote site, without having to do a filesystem-level copy of all
> the files in the postgresql database directory? If not, any hints on
> what I did wrong?
>
> Thanks,

Ping?

Is this possible, or should I spend time to rsync files over?

Thanks,

--
Wouter Verhelst

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Κοκμάδης Δημήτριος 2016-09-05 09:41:25 Autovacuum Running Timeframe
Previous Message Eduardo Morras 2016-08-31 21:14:34 Re: What causes streaming replication delay? [SOLVED, I think]