Re: PITR problem

From: wstrzalka <wstrzalka(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: PITR problem
Date: 2008-04-30 09:32:35
Message-ID: fda66815-b317-4706-b323-968671451f5f@e39g2000hsf.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29 Kwi, 17:16, e(dot)(dot)(dot)(at)myemma(dot)com (Erik Jones) wrote:
> On Apr 29, 2008, at 3:20 AM, wstrzalka wrote:
>
>
>
> >> What is the full pg_standby command string (restore_command=....) in
> >> your recovery.conf. It sound's like you have pg_standby set to
> >> delete
> >> archived WALs and possibly have that a little too aggressive. Do you
> >> have the -k flag set in your pg_standby call in your restore_command?
>
> > My restore command is:
> > -----------------------------------------------------------------------------------------
> > restore_command = 'pg_standby -l -d -s 5 -w 0 -t /tmp/
> > pgsql.promote_trigger.5432 ~postgres/incoming_wal %f %p %r 2>&1 |
> > logger -p local1.info -t pitr-standby'
> > -----------------------------------------------------------------------------------------
>
> > As you can see I didn't set -k to keep fixed number of WALs, but %r
> > parameter and the PostgreSQL controls number of keeped files
> > automatically (or at least it should)
>
> Ok, I hadn't yet set up a standby on 8.3 and so hadn't seen that the
> %r macro obviates the need for the -k flag. So...
>
> The output from pg_standby:
> ------------------------------------
> Trigger file : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file : 00000001.history
> WAL file path : /var/lib/pgsql/incoming_wal/
> 00000001.history
> Restoring to... : pg_xlog/RECOVERYHISTORY
> Sleep interval : 5 seconds
> Max wait interval : 0 forever
> Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
> 00000001.history" "pg_xlog/RECOVERYHISTORY"
> Keep archive history : 0000000100000001000000DB and later
> running restore : OK
>
> Trigger file : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file : 0000000100000001000000D9.00000020.backup
> WAL file path : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup
> Restoring to... : pg_xlog/RECOVERYHISTORY
> Sleep interval : 5 seconds
> Max wait interval : 0 forever
> Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY"
> Keep archive history : 0000000100000001000000DB and later
> running restore : OK
>
> Note that here, from the start, postgres is telling the recovery
> command that it only needs from 0000000100000001000000DB and on.
>
> Here's where it gets to restoring the first actual log file:
>
> Trigger file : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file : 0000000100000001000000D9
> WAL file path : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9
> Restoring to... : pg_xlog/RECOVERYXLOG
> Sleep interval : 5 seconds
> Max wait interval : 0 forever
> Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9" "pg_xlog/RECOVERYXLOG"
> Keep archive history : 0000000100000001000000DB and later
> running restore : OK
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9"
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA"
>
> Since it says 'OK' but then fails my guess is that the order of
> operations goes something along the lines of this (I could be totally
> off):
>
> 1. Is /var/lib/pgsql/incoming/0000000100000001000000D9 present? -> OK
> 2. Clean up files older than 0000000100000001000000DB -> Delete /var/
> lib/pgsql/incoming/0000000100000001000000D9 and /var/lib/pgsql/
> incoming/0000000100000001000000DA
> 3. Restore /var/lib/pgsql/incoming/0000000100000001000000D9 -> This is
> where it breaks.
>
> So, the question is: why does does the server say that it only needs
> 0000000100000001000000DB and later? Did you clear out your pg_xlog
> directory before starting up the standby?
>

Yes - the param passed to %r looks bad from start.
Generally I like the %r because I don't need to worry if there are
enough WALs to continue recovery after standby reboot and I don't keep
many of the files at the same time, but I think something is wrong
with it.
And answering your question - I don't delete any files before standby
start.

So it looks like a bug for me - probably I should submit it to
pgsql.bugs - unfortunatelly ( or fortunatelly :D ) my test environment
is production now so I'll not be able to reproduce it easily.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pau Marc Munoz Torres 2008-04-30 09:50:33 complex query using postgresql
Previous Message Tom Lane 2008-04-30 02:20:06 Re: Deadlock situation?