Re: Hot Standby 0.2.1

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby 0.2.1
Date: 2009-09-23 09:07:19
Message-ID: 4AB9E547.9040602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The logic in the lock manager to track the number of held
AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and
ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion
into ProcArrayDecrementNumHeldLocks:

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc)
void
ProcArrayDecrementNumHeldLocks(PGPROC *proc)
{
+ Assert(proc->numHeldLocks > 0);
proc->numHeldLocks--;
}

This tripped the assertion:

postgres=# CREATE TABLE foo (id int4 primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Making matters worse, the primary server refuses to startup up after
that, tripping the assertion again in crash recovery:

$ bin/postmaster -D data
LOG: database system was interrupted while in recovery at 2009-09-23
11:56:15 EEST
HINT: This probably means that some data is corrupted and you will have
to use the last backup for recovery.
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/32000070
LOG: REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32:
Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352
LOG: consistent recovery state reached
LOG: REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4:
XLOG - nextOid: 24600
LOG: REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12:
Storage - file create: base/11562/16408
LOG: REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len
12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408
LOG: REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len
159: Heap - insert: rel 1663/11562/1259; tid 5/4
...
LOG: REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len
264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval
msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache
id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache
smgr relcache smgr relcache
TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c",
Line: 1404)
LOG: startup process (PID 27430) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

I'm sure that's just a simple bug somewhere, but it highlights that we
need be careful to avoid putting any extra work into the normal recovery
path. Otherwise bugs in hot standby related code can cause crash
recovery to fail.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Roger Leigh 2009-09-23 09:16:28 Re: Unicode UTF-8 table formatting for psql text output
Previous Message Heikki Linnakangas 2009-09-23 08:13:45 Re: Hot Standby 0.2.1