Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

From: "John Smith" <sodgodofall(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
Date: 2008-06-30 17:40:03
Message-ID: b88f0d670806301040w6f7e8e61x9e8e61f7540b7480@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I hit an issue running PG 8.2.3 with the continuous archiving feature
where I was unable to recover from the backup. I was wondering if
this may be related to bug #3245?

These are the steps that occurred before I saw this problem:

1. Prepare transaction.
2. A base backup of the database was taken to a warm standby system.
3. Commit prepared. The commit prepared never finished as it hit a PANIC:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC: failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT: commit prepared '148969' ;

I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system. Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.

2008-06-18 00:05:34.045 Local time zone must be set--see zic manual
page LOG: database system was interrupted at 2008-06-17 23:53:16
Local time zone must be set--see zic manual page
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: checkpoint record is at 70/E600DC18
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: redo record is at 70/E600DC18; undo record is at 0/0;
shutdown FALSE
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: next transaction ID: 0/1099178; next OID: 413234
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: next MultiXactId: 1; next MultiXactOffset: 0
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: database system was not properly shut down; automatic
recovery in progress
2008-06-18 00:05:34.105 Local time zone must be set--see zic manual
page LOG: redo starts at 70/E600DC68
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG: could not open file "pg_xlog/0000000100000070000000E7" (log
file 112, segment 231): No such file or directory
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG: redo done at 70/E600DC68
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099169
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099156
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099157
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099161
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099164
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099162
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099166
2008-06-18 00:05:34.294 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099131
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL: lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: aborting startup due to startup process failure

Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

Unfortunately I do not have a way to deterministically reproduce this
problem but I have seen it 3 times so far.

thanks,

John

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2008-06-30 19:26:40 Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
Previous Message Valentin Bogdanov 2008-06-30 12:11:13 psql: FATAL: the database system is starting up