Strange deadlock error last night

From: "Scott Whitney" <swhitney(at)journyx(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Strange deadlock error last night
Date: 2009-01-13 17:37:00
Message-ID: 20090113173555.80E667E4FD8@mail.int.journyx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I'm about to begin researching this, but I thought I'd throw this out there
and see if there are any quick responses.

Last night, I got this:

Jan 13 03:31:28 db01 postgres[23537]: [140-2] DETAIL: Process 23537 waits
for AccessShareLock on relation 1260 of database 0; blocked by process
22228.
Jan 13 03:31:28 db01 postgres[23537]: [140-3] Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.
Jan 13 03:31:28 db01 postgres[23537]: [140-4] Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 14624.
Jan 13 03:31:28 db01 postgres[23537]: [140-5] Process 14624 waits for
AccessExclusiveLock on relation 1260 of database 0; blocked by process
23537.
Jan 13 03:31:28 db01 postgres[14624]: [243-1] ERROR: deadlock detected
Jan 13 03:31:28 db01 postgres[14624]: [243-2] DETAIL: Process 14624 waits
for AccessExclusiveLock on relation 1260 of database 0; blocked by process
22228.
Jan 13 03:31:28 db01 postgres[14624]: [243-3] Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.
Jan 13 03:31:28 db01 postgres[14624]: [243-4] Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 14624.
Jan 13 03:31:28 db01 postgres[6816]: [54-1] ERROR: deadlock detected
Jan 13 03:31:29 db01 postgres[6816]: [54-2] DETAIL: Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 22228.
Jan 13 03:31:29 db01 postgres[6816]: [54-3] Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.

Here's what the procs were doing at the time, according to the logs:

** Jan 12 03:32:08 db01 postgres[22228]: [279-1] WARNING: skipping
"pg_database" --- only table or database owner can vacuum it
** Jan 12 03:33:42 db01 postgres[14624]: [242-1] WARNING: skipping
"pg_database" --- only table or database owner can vacuum it
Jan 13 03:30:36 db01 postgres[23537]: [139-1] WARNING: skipping "pg_authid"
--- only table or database owner can vacuum it
Jan 13 03:31:25 db01 postgres[6816]: [53-1] WARNING: skipping
"pg_auth_members" --- only table or database owner can vacuum it

** Note that procs 14624 and 22228 do not appear on the same day at the same
time. I have no record of any pg logging of those processes other than what
you see above. That, unfortunately, doesn't necessarily tell me anything.

It ended up locking up about 250 customer databases until I restarted the
postmaster. This is version 8.1.4. Upgrading right now (even to a minor rev)
is not really an option. This box has been up and running for 306 days. This
postgres level has been installed for..err...well...at least Aug 9, 2006,
based on some dates in the directories.

I can't determine what "database 0" is. It's not a datid. I'm assuming it's
a relative number based on what was going on at the time. We're running
auto-vacuum, as we have been for years. A full vacuum happens every Saturday
night.

Any clever ideas? Requests for more information?

Thanks in advance!

Scott Whitney
Journyx, Inc.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2009-01-13 21:04:17 Re: Strange deadlock error last night
Previous Message Scott Marlowe 2009-01-12 20:09:19 Re: PGOPTIONS and default_tablespace