From: | "Nicholas Vinen" <hb(at)pg(dot)x256(dot)org> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #2285: Can not access database after successful PITR - file naming problems |
Date: | 2006-02-27 13:46:26 |
Message-ID: | 20060227134626.E4695F0B03@svr2.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged online:
Bug reference: 2285
Logged by: Nicholas Vinen
Email address: hb(at)pg(dot)x256(dot)org
PostgreSQL version: 8.1.3
Operating system: Gentoo Linux x86
Description: Can not access database after successful PITR - file
naming problems
Details:
For debugging/testing purposes, I have taken to performing a PITR on a test
machine from the production database back-ups. I restore a snap-shot of the
production server which was taken before the time I am interested in
debugging, then use the PITR method to get the database to the point I want
to test at. I can then "rewind" the database and test again by restoring
again.
This used to work (either before 8.0 or before 8.1, I'm not sure). I also
have this back-up in case the production database server dies, but can't
really test restoring it on the production machine except when data is lost,
otherwise I will interrupt service. The machines are almost identical, the
only real difference is that one is a Pentium 3 and one is a Pentium 4, so I
don't see why that should matter. They are both running virtually identical
Linux installations.
Now, when I try to use PITR to restore the database on my test server, the
PITR succeeds as normal, but I can not access the database with an error
like:
psql: FATAL: could not open relation 1663/16385/605464: No such file or
directory
This happens when I attempt to connect to my database. I can connect to some
of the others (such as the template databases) fine.
Interestingly, the 605464 file was in the back-up snapshot, but the process
of performing the PITR seems to rename the file, but it's still looking
under the old name. This is fully reproducible, so if I am not providing
sufficient information here, just let me know what you need in order to fix
this.
Here is a log of what I have done up to the error:
rt2 ~ # cd /var/lib/postgresql/data
rt2 data # rm -rf *
rt2 data # gzip -cd /backup/Helpdesk/Database/Snapshots/Weekly/2006-06\ \
Sun\ 12\ Feb.io.gz | cpio -i
3229320 blocks
rt2 data # ls -al base/16385/605464
-rw------- 1 postgres postgres 40960 Feb 27 02:57 base/16385/605464
rt2 data # rm pg_xlog/*
rm: cannot remove `pg_xlog/archive_status': Is a directory
rt2 data # cp /data/postgresql/recovery.conf
/data/postgresql/postgresql.conf .
rt2 data # chown postgres *
rt2 data # /etc/init.d/postgresql start
* Starting PostgreSQL ...
rt2 data # tail -f /var/log/postgres/current
Feb 27 03:06:12 [postgres] [1-1] LOG: could not create IPv6 socket: Address
family not supported by protocol
Feb 27 03:06:12 [postgres] [2-1] LOG: database system was interrupted at
2006-02-12 01:30:02 PST
Feb 27 03:06:12 [postgres] [3-1] LOG: starting archive recovery
Feb 27 03:06:12 [postgres] [4-1] LOG: restore_command = "gzip -cd
/backup/Helpdesk/Database/TransactionLog/"%f">"%p""
Feb 27 03:06:12 [postgres] [5-1] LOG: recovery_target_time = 2006-02-17
08:20:00-08
Feb 27 03:06:12 [postgres] [6-1] LOG: restored log file
"000000010000000B00000002.00A9FA60.backup" from archive
Feb 27 03:06:14 [postgres] [7-1] LOG: restored log file
"000000010000000B00000002" from archive
Feb 27 03:06:14 [postgres] [8-1] LOG: checkpoint record is at B/2A9FA60
Feb 27 03:06:14 [postgres] [9-1] LOG: redo record is at B/2A9FA60; undo
record is at 0/0; shutdown FALSE
Feb 27 03:06:14 [postgres] [10-1] LOG: next transaction ID: 34066581; next
OID: 611376
Feb 27 03:06:14 [postgres] [11-1] LOG: next MultiXactId: 633; next
MultiXactOffset: 1265
Feb 27 03:06:14 [postgres] [12-1] LOG: automatic recovery in progress
Feb 27 03:06:14 [postgres] [13-1] LOG: redo starts at B/2A9FAA4
Feb 27 03:06:27 [postgres] [14-1] LOG: restored log file
"000000010000000B00000003" from archive
Feb 27 03:06:38 [postgres] [15-1] LOG: restored log file
"000000010000000B00000004" from archive
Feb 27 03:06:41 [postgres] [16-1] LOG: restored log file
"000000010000000B00000005" from archive
....
Feb 27 03:47:13 [postgres] [682-1] LOG: restored log file
"000000010000000D000000A1" from archive
Feb 27 03:47:17 [postgres] [683-1] LOG: restored log file
"000000010000000D000000A2" from archive
Feb 27 03:47:21 [postgres] [684-1] LOG: restored log file
"000000010000000D000000A3" from archive
Feb 27 03:47:21 [postgres] [685-1] LOG: recovery stopping before commit of
transaction 42586328, time 2006-02-17 08:20:01 PST
Feb 27 03:47:21 [postgres] [686-1] LOG: redo done at D/A35794EC
Feb 27 03:47:21 [postgres] [687-1] LOG: selected new timeline ID: 2
Feb 27 03:47:22 [postgres] [688-1] LOG: archive recovery complete
Feb 27 03:47:32 [postgres] [689-1] LOG: database system is ready
Feb 27 03:47:32 [postgres] [690-1] LOG: transaction ID wrap limit is
1103292637, limited by database "postgres"
Feb 27 05:39:54 [postgres] [2-1] LOG: invalid server process ID -1 (new
error in 8.1.3)
rt2 data # psql -U postgres rt3
psql: FATAL: could not open relation 1663/16385/605464: No such file or
directory
rt2 data # ls -al base/16385/605464
rt2 data #
recovery.conf:
restore_command = 'gzip -cd
/backup/Helpdesk/Database/TransactionLog/"%f">"%p"'
recovery_target_time = '2006-02-17 16:20:00+0';
I believe this should work. I checked the PITR limitations and I don't
believe I've broken any of them. I also belive this should work between
seperate machines of the same architecture. It certainly has in the past.
Please let me know how to proceed.
By the way, I can access *some* of my databases (the smaller ones), just not
the main one. I suspect perhaps it's due to an ALTER TABLE or similar
command I committed on the main database inbetween the snapshot and the
restore point, which is not being handled properly during the transation log
replay.
Thanks,
Nicholas
From | Date | Subject | |
---|---|---|---|
Next Message | David Sauer | 2006-02-27 14:39:37 | BUG #2286: Wrong index creation with cs_CZ locales and HYPHEN |
Previous Message | suvarnat | 2006-02-27 11:52:19 | BUG #2284: missing sequence number |