Weird Assert failure in GetLockStatusData()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Weird Assert failure in GetLockStatusData()
Date: 2013-01-08 15:39:25
Message-ID: 8053.1357659565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a bit disturbing:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushpig&dt=2013-01-07%2019%3A15%3A02

The key bit is

[50eb2156.651e:6] LOG: execute isolationtester_waiting: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid <> $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShar!
eLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'AccessExclusiveLock' THEN ARRAY['AccessShareLock','RowShareLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] END) AND holder.locktype IS NOT DISTINCT FROM waiter.locktype AND holder.database IS NOT DISTINCT FROM waiter.database AND holder.relation IS NOT DISTINCT FROM waiter.relation AND holder.page IS NOT DISTINCT FROM waiter.page AND holder.tuple IS NOT DISTINCT FROM waiter.tuple AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid AND holder.transactionid IS NOT DISTINCT FROM waiter.transactionid AND holder.classid IS NOT DISTINCT FROM waiter.classid AND holder.objid IS NOT DISTINCT FROM waiter.objid AND holder.objsubid IS NOT DISTINCT FROM waiter.objsubid
[50eb2156.651e:7] DETAIL: parameters: $1 = '25889'
TRAP: FailedAssertion("!(el == data->nelements)", File: "lock.c", Line: 3398)
[50eb2103.62ee:2] LOG: server process (PID 25886) was terminated by signal 6: Aborted
[50eb2103.62ee:3] DETAIL: Failed process was running: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid <> $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShareL!
ock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','E

The assertion failure seems to indicate that the number of
LockMethodProcLockHash entries found by hash_seq_search didn't match the
number that had been counted by hash_get_num_entries immediately before
that. I don't see any bug in GetLockStatusData itself, so this suggests
that there's something wrong with dynahash's entry counting, or that
somebody somewhere is modifying the shared hash table without holding
the appropriate lock. The latter seems a bit more likely, given that
this must be a very low-probability bug or we'd have seen it before.
An overlooked locking requirement in a seldom-taken code path would fit
the symptoms.

Or maybe bushpig just had some weird cosmic-ray hardware failure,
but I don't put a lot of faith in such explanations.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniele Varrazzo 2013-01-08 16:55:58 Re: PL/Python result object str handler
Previous Message Andres Freund 2013-01-08 15:27:01 Re: Extra XLOG in Checkpoint for StandbySnapshot