Re: recovery_target_time ignored ?

From: Venkata Balaji N <nag1010(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery_target_time ignored ?
Date: 2015-03-25 09:03:05
Message-ID: CAEyp7J-T8hP0H9nDWwyWR2dcugiHCo-GA66L7uZd7F+1im9ZHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2015 at 1:28 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 3/24/15 6:12 AM, Venkata Balaji N wrote:
> >
> > On Tue, Mar 24, 2015 at 9:54 AM, David Steele <david(at)pgmasters(dot)net
> > <mailto:david(at)pgmasters(dot)net>> wrote:
> >
> > On 3/23/15 12:42 AM, Venkata Balaji N wrote:
> > > Hi,
> > >
> > > Assuming that this might require a patch, i am posting this in
> > > pgsql-hackers. Apologies, if this is not the appropriate mailing
> list to
> > > start this discussion.
> > >
> > > I performed a PITR and saw the below message in the log file is a
> bit
> > > confusing.
> > >
> > > 2015-03-23 13:49:09.816 GMT-10 DB= PID=4707 LOG: *database system
> was
> > > interrupted; last known up at 2015-03-23 10:30:26 GMT-10*
> > > 2015-03-23 13:49:09.817 GMT-10 DB= PID=4707 LOG: *starting
> > point-in-time
> > > recovery to 2015-03-23 10:00:26+10*
> > > 2015-03-23 13:49:09.827 GMT-10 DB= PID=4707 LOG: restored log file
> > > "000000010000000B00000020" from archive
> > > 2015-03-23 13:49:09.888 GMT-10 DB= PID=4707 LOG: redo starts at
> B/20000090
> > > 2015-03-23 13:49:09.937 GMT-10 DB= PID=4707 LOG: consistent
> recovery
> > > state reached at B/200000B8
> > > 2015-03-23 13:49:09.947 GMT-10 DB= PID=4707 LOG: restored log file
> > > "000000010000000B00000021" from archive
> > > 2015-03-23 13:49:09.950 GMT-10 DB= PID=4707 LOG: *recovery
> stopping
> > > before commit of transaction 16267, time 2015-03-23
> 13:22:37.53007+10*
> > >
> > >
> > > By mistake i gave "recovery_target_time" as 10:00 GMT which is
> 25/30
> > > minutes behind the backup start/end time registered in the
> backup_label.
> > >
> > > The parameter recovery_target_time is ignored and recovery proceeds
> > > further applying all the available WAL Archive files finally ends
> up
> > > bringing up the database.
> > >
> > > I think it would make sense if the recovery does not proceed any
> further
> > > and error out with a message like "recovery_target_time is behind
> the
> > > backup time.. please consider using the backup taken prior to the
> > > recovery_target_time"
> >
> > I just tried it with 9.3.5 and I do get an error:
> >
> > LOG: starting point-in-time recovery to 2015-03-23
> 17:26:02.721307-04
> > LOG: restored log file "000000010000000000000003" from archive
> > LOG: redo starts at 0/30000C8
> > LOG: recovery stopping before commit of transaction 1001, time
> > 2015-03-23 18:26:01.012593-04
> > LOG: redo done at 0/3000228
> > FATAL: requested recovery stop point is before consistent recovery
> > point
> >
> >
> > That makes more sense. This is what i was expecting to happen. Then, i
> > think it is something to do with the timestamp format.
> >
> >
> > Here's my recovery.conf file:
> >
> > restore_command = '/usr/bin/pg_backrest.pl <http://pg_backrest.pl>
> > --stanza=db archive-get %f "%p"'
> > recovery_target_time = '2015-03-23 17:26:02.721307 EDT'
> >
> > > recovery.conf file is as follows :
> > >
> > > restore_command='cp /data/pgdata9400backup/pgwalarch9400backup/%f
> %p '
> > > recovery_target_time='2015-03-23 10:00:26 GMT-10'
> > > recovery_target_inclusive='true'
> >
> > You have '2015-03-23 10:00:26 GMT-10' in recovery.conf but the log
> says
> > 'starting point-in-time recovery to 2015-03-23 10:00:26+10'. Note
> the -
> > vs +.
> >
> >
> > This is my confusion too. I picked up the time format from the backup
> label.
> >
> > Could you check your log and recovery.conf and make sure the timezone
> > offsets are actually different?
> >
> >
> > I am not sure why the timestamp is taken as "2015-03-23 10:00:26+10"
> > for "2015-03-23 10:00:26 GMT-10'".
> >
> > My another system's timestamp format is also AEDT. I did another test
> > and I get the same problem.
> >
> > Below is the text from the log file. I gave a recovery_target_time
> > almost a day behind the consistent recovery point. Still, the
> > recovery_target_time is taken as "*2015-03-23 11:07:10+11*" for
> > "*2015-03-23 11:07:10 AEDT*"
> >
> > 2015-03-24 18:42:44.608 AEDT LOG: database system was interrupted;
> > last known up at *2015-03-24 18:20:53 AEDT*
> >
> > 2015-03-24 18:42:44.608 AEDT LOG: starting point-in-time recovery
> > to *2015-03-23 11:07:10+11*
> >
> > cp: /disk3/pgwalarch9401/0000000100000003000000FE: No such file or
> > directory
> >
> > 2015-03-24 18:42:44.626 AEDT LOG: record with zero length at
> 3/FE000090
> >
> > Below is my recovery.conf file
> >
> > restore_command='cp /disk3/pgwalarch9401/%f %p'
> >
> > recovery_target_time='*2015-03-23 11:07:10 AEDT*'
> >
> > recovery_target_inclusive=true
> >
> > I am checking if this has something to do with my system timestamp
> format.
> >
> > Not sure what am i missing. Do i need to give any special time format ?
>
> This is a weird one. I can reproduce the problem by setting
> timezone=timezone='Australia/Sydney'. I also tried setting
> log_timezone=timezone='Australia/Sydney' and my system clock to
> 'Australia/Sydney' but I still saw the same issue.
>
> I'm testing this using unit tests for some backup software I'm working on.
>
> 1) Postgres is inited and started:
>
> initdb -D /test/db/common -A trust;pg_ctl start -o "-c port=6543 -c
> checkpoint_segments=1 -c log_timezone='Australia/Sydney' -c
> timezone='Australia/Sydney' -c wal_level=hot_standby -c archive_mode=on
> -c archive_command='/test/bin/pg_backrest.pl --stanza=db archive-push
> %p' -c unix_socket_directories='/test/db'" -D /test/db/common -l
> /test/db/common/postgresql.log -w -s
>
> 2) The backup/restore unit tests run
>
> 2015-03-24 09:56:19 T00 INFO: run 001 - rmt 0, arc_async 0, cmp 0
> 2015-03-24 09:56:23 T00 INFO: full backup (insert during backup)
> 2015-03-24 09:56:26 T00 INFO: time target is 2015-03-25
> 00:56:26.783117 EST
> 2015-03-24 09:56:26 T00 INFO: incr backup (update during backup)
> 2015-03-24 09:56:30 T00 INFO: xid target is 1007
> 2015-03-24 09:56:30 T00 INFO: name target is backrest
> 2015-03-24 09:56:30 T00 INFO: testing recovery type = default
> 2015-03-24 09:56:30 T00 INFO: restore, backup
> '20150324-095626F', type 'default', expect exit 113 (postmaster running)
> 2015-03-24 09:56:33 T00 INFO: restore, backup
> '20150324-095626F', type 'default', expect exit 115 (path not empty)
> 2015-03-24 09:56:33 T00 INFO: restore, backup
> '20150324-095626F', type 'default'
> 2015-03-24 09:56:40 T00 INFO: testing recovery type = xid
> 2015-03-24 09:56:41 T00 INFO: restore, force, backup
> '20150324-095626F', type 'xid', target '1007', resume
> 2015-03-24 09:56:47 T00 INFO: testing recovery type = preserve
> 2015-03-24 09:56:48 T00 INFO: restore delta, backup
> '20150324-095626F', type 'preserve'
> 2015-03-24 09:56:54 T00 INFO: testing recovery type = time
> 2015-03-24 09:56:55 T00 INFO: restore delta, backup
> '20150324-095626F', type 'time', target '2015-03-25 00:56:26.783117 EST'
> expected value 'time' from 'select message from test' but actual was
> 'name' at lib/BackRestTest/BackupTest.pm line 178.
>
> 3) The "recovery type = time" test is failing because it's running to
> the end of the WAL stream rather than stopping at the appropriate time.
>

This is the problem i am facing.

> Here's the recovery.conf:
>
> restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f
> "%p"'
> recovery_target_time = '2015-03-25 00:56:26.783117 EST'
>
> And the log file:
>
> LOG: database system was interrupted; last known up at 2015-03-25
> 00:56:24 EST
> LOG: creating missing WAL directory "pg_xlog/archive_status"
> LOG: starting point-in-time recovery to 2015-03-25 16:56:26.783117+11
> LOG: restored log file "000000010000000000000003" from archive
> LOG: redo starts at 0/30000C8
> LOG: consistent recovery state reached at 0/3000280
> LOG: restored log file "000000010000000000000004" from archive
> LOG: restored log file "000000010000000000000005" from archive
> FATAL: the database system is starting up
> LOG: restored log file "000000010000000000000006" from archive
> LOG: restored log file "000000010000000000000007" from archive
> LOG: restored log file "000000010000000000000008" from archive
> LOG: restored log file "000000010000000000000009" from archive
> LOG: restored log file "00000001000000000000000A" from archive
> FATAL: the database system is starting up
> LOG: restored log file "00000001000000000000000B" from archive
> LOG: restored log file "00000001000000000000000C" from archive
> LOG: redo done at 0/C000130
> LOG: last completed transaction was at log time 2015-03-25
> 00:56:30.658752+11
> LOG: restored log file "00000001000000000000000C" from archive
> FATAL: the database system is starting up
> LOG: restored log file "00000002.history" from archive
> LOG: restored log file "00000003.history" from archive
> LOG: restored log file "00000004.history" from archive
> LOG: selected new timeline ID: 5
> LOG: archive recovery complete
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
>
> There's definitely some kind of multiple TZ conversion going on here,
> but I'm not see why. However, I did get the test to work again by
> removing the TZ from the timestamp in recovery.conf:

> restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f
> "%p"'
> recovery_target_time = '2015-03-25 01:08:45.765919'
>
> Here's the log:
>
> LOG: database system was interrupted; last known up at 2015-03-25
> 01:08:43 EST
> LOG: creating missing WAL directory "pg_xlog/archive_status"
> LOG: starting point-in-time recovery to 2015-03-25 01:08:45.765919+11
> LOG: restored log file "000000010000000000000003" from archive
> LOG: redo starts at 0/30000C8
> LOG: consistent recovery state reached at 0/3000280
> LOG: restored log file "000000010000000000000004" from archive
> LOG: restored log file "000000010000000000000005" from archive
> LOG: recovery stopping before commit of transaction 1003, time
> 2015-03-25 01:08:45.770163+11
> LOG: redo done at 0/5013900
> LOG: last completed transaction was at log time 2015-03-25
> 01:08:45.765465+11
> LOG: restored log file "00000002.history" from archive
> FATAL: the database system is starting up
> LOG: restored log file "00000003.history" from archive
> LOG: restored log file "00000004.history" from archive
> LOG: selected new timeline ID: 5
> LOG: archive recovery complete
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
>
> You can see that the timestamps now match up and the test was
> successful. I know this doesn't completely solve the problem, but
> hopefully it will get you going again and might provide a clue to the
> underlying behavior.
>

Your logs above also show that the timestamp is taken as
recovery_target_time+11.

Yeah, i am trying to understand the conversion mechanism here. Need to do a
bit more testing.

> There may also be some confusion because EST is used in both Australia
> and the U.S (though it is EDT here now). I wasn't able to set the
> timezone on Postgres to AEDT or get any output that said AEDT. I'd be
> curious to know how you did that.
>

I am testing in 9.4.1. You mentioned that, you are using PG Version 9.3.5.
There are lot of timezone related fixes in 9.3.6.
In my OSX laptop which is has AEDT timezone settings, version 9.3.6 by
default picks up 'Australia/ACT' timezone at the time of initdb and the
same does not happen in 9.3.5. That might be the reason you are unable to
set Australian timezone.

Regards,
Venkata Balaji N

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru HANADA 2015-03-25 09:44:39 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Bruce Momjian 2015-03-25 07:59:52 Re: printing table in asciidoc with psql