Re: warm standby, pg_standby, invalid checkpoint record

From: Lee Azzarello <lee(at)dropio(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: warm standby, pg_standby, invalid checkpoint record
Date: 2009-03-04 23:23:18
Message-ID: 16b031c0903041523l4cb2b52fg6e50068a043a5915@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

does a file named 0000000100000000000000CD exist anywhere on your disk?

-lee

On Fri, Feb 27, 2009 at 2:47 AM, Brad Wiemerslage <wiemersl(at)yahoo(dot)com> wrote:
>
> I'm attempting to get warm standby up and running with a pair of servers running ubuntu 8.04 and postgresql 8.3.  Been following the docs:
>
> http://www.postgresql.org/docs/8.3/static/warm-standby.html
> http://www.postgresql.org/docs/current/static/pgstandby.html
>
> Also, basically following the ideas here in this blog post:
>
> http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html
>
> I've customized the original script he refers to in the article, which is here in its entirety for reference:
>
> https://s3.amazonaws.com/extras.continuent.com/standby.sh
>
> Here is the meat of my customized script, which runs on the standby.  The postgresql server on the standby is stopped first.
>
> start_backup="SELECT pg_start_backup('my_backup');"
> stop_backup="SELECT pg_stop_backup();"
> echo "$start_backup" | $psql -h$PRIMARY -U myuser -d mydb -e
> rsync --delete -avz -e "ssh -i /path/to/key" myuser(at)$PRIMARY:$PG_DATA/ $PG_DATA
> echo "$stop_backup" | $psql -h $PRIMARY -U myuser -d mydb -e
>
> The files seem to copied over to the standby machine just fine.  Success is reported with respect to the backup commands.  Permissions seem fine.
>
> Next, there are some steps which blow out some files.  As I understand it, you no longer need the files on the standby that were in pg_xlog on the primary.
>
> rm -f $PG_DATA/recovery.*
> rm -f $PG_DATA/8.3/main/logfile
> rm -f $PG_DATA/8.3/main/postmaster.pid
> rm -f $PG_DATA/8.3/main/pg_xlog/0*
> rm -f $PG_DATA/8.3/main/pg_xlog/archive_status/0*
>
> This step seems to work fine.
>
> Then, the archives are pulled.  They are pulled to /mnt/postgresql_archives with this command:
>
> rsync --delete -avz -e "ssh -i /path/to/key" myuser(at)$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES
>
> Everything looks good.  I end up with an up to date list of WAL files in /mnt/postgresql_archives on the standby.  Here is a listing:
>
> root(at)standby:/mnt/postgresql_archives# ls
> total 688996
> drwxr-xr-x  2 postgres postgres     4096 2009-02-27 01:13 .
> drwxr-xr-x 14 root     root         4096 2009-02-27 01:17 ..
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:19 0000000100000000000000CB
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:29 0000000100000000000000CC
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:38 0000000100000000000000CD
> -rw-rw----  1 postgres postgres      245 2009-02-27 00:38 0000000100000000000000CD.00000020.backup
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:48 0000000100000000000000CE
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:54 0000000100000000000000CF
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 00:58 0000000100000000000000D0
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 01:01 0000000100000000000000D1
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 01:03 0000000100000000000000D2
> -rw-rw----  1 postgres postgres 16777216 2009-02-27 01:13 0000000100000000000000D3
>
> Then, the recovery.conf is put in place.  I've tried two different versions, which end up giving me the same error.  Here are the two different versions.
>
> #1: restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -c -d -s 2 -t /mnt/postgresql_archives/pgsql.trigger /mnt/postgresql_archives %f %p >> /mnt/postgresql_archives/standby.log 1>&2'
>
> #2: restore_command = 'cp /mnt/server/archivedir/%f "%p"'
>
> I don't believe that #2 is suitable for warm standby, but just tried it to debug after #1 wouldn't work.  Now, I try to start up the server.  For it to work in standby mode, additional archive files will be pulled from the primary machine on a periodic basis.  I'm using this command, which deletes them on the primary when they are no longer necessary.  It also seems to work fine.
>
> rsync -avz -e "ssh -i /path/to/key" myuser(at)$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES
>
> I guess I'm a little confused about exactly what is happening here when the server comes up, but here is the error message I'm getting.  It seems to be looking for the files in pg_pxlog, which is cleared out.  So, the error makes sense.  But isn't it supposed to be looking in /mnt/postgresql_archives per the restore_command(s)?  The files are available there.
>
> 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,2,,2009-02-27 01:26:52 EST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000100000000000000CD"" (log file 0, segment 20
> 5): No such file or directory",,,,,,,,
> 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,3,,2009-02-27 01:26:52 EST,,0,LOG,00000,"invalid checkpoint record",,,,,,,,
> 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,4,,2009-02-27 01:26:52 EST,,0,PANIC,XX000,"could not locate required checkpoint record",,"If you are not restoring from a
>  backup, try removing the file ""/var/lib/postgresql/8.3/main/backup_label"".",,,,,,
> 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,1,,2009-02-27 01:26:51 EST,,0,LOG,00000,"startup process (PID 7422) was terminated by signal 6: Aborted",,,,,,,,
> 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,2,,2009-02-27 01:26:51 EST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,
>
> So, I tried copying the files in the /mnt/postgresql_archives over to pg_xlog.  This seemed to work, and the updates were applied.  Never at any point did I get a recovery.done file.  Also, for whatever reason, I was never able to get any debug info from pg_standby in standby.log.
>
> Anyhow, I've burned up a couple days trying to figure this out.  Any help would be much appreciated.
>
> Thanks,
> Brad
>
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message raf 2009-03-05 00:32:59 Re: getting 'full' names of functions?
Previous Message Tom Lane 2009-03-04 22:30:10 Re: Database in use?