Skip site navigation (1) Skip section navigation (2)

Weird Assert failure in GetLockStatusData()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Weird Assert failure in GetLockStatusData()
Date: 2013-01-08 15:39:25
Message-ID: 8053.1357659565@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
This is a bit disturbing:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushpig&dt=2013-01-07%2019%3A15%3A02

The key bit is

[50eb2156.651e:6] LOG:  execute isolationtester_waiting: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid <> $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShar!
 eLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'AccessExclusiveLock' THEN ARRAY['AccessShareLock','RowShareLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] END) AND holder.locktype IS NOT DISTINCT FROM waiter.locktype AND holder.database IS NOT DISTINCT FROM waiter.database AND holder.relation IS NOT DISTINCT FROM waiter.relation AND holder.page IS NOT DISTINCT FROM waiter.page AND holder.tuple IS NOT DISTINCT FROM waiter.tuple AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid AND holder.transactionid IS NOT DISTINCT FROM waiter.transactionid AND holder.classid IS NOT DISTINCT FROM waiter.classid AND holder.objid IS NOT DISTINCT FROM waiter.objid AND holder.objsubid IS NOT DISTINCT FROM waiter.objsubid 
[50eb2156.651e:7] DETAIL:  parameters: $1 = '25889'
TRAP: FailedAssertion("!(el == data->nelements)", File: "lock.c", Line: 3398)
[50eb2103.62ee:2] LOG:  server process (PID 25886) was terminated by signal 6: Aborted
[50eb2103.62ee:3] DETAIL:  Failed process was running: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid <> $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShareL!
 ock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','E

The assertion failure seems to indicate that the number of
LockMethodProcLockHash entries found by hash_seq_search didn't match the
number that had been counted by hash_get_num_entries immediately before
that.  I don't see any bug in GetLockStatusData itself, so this suggests
that there's something wrong with dynahash's entry counting, or that
somebody somewhere is modifying the shared hash table without holding
the appropriate lock.  The latter seems a bit more likely, given that
this must be a very low-probability bug or we'd have seen it before.
An overlooked locking requirement in a seldom-taken code path would fit
the symptoms.

Or maybe bushpig just had some weird cosmic-ray hardware failure,
but I don't put a lot of faith in such explanations.

Thoughts?

			regards, tom lane


Responses

pgsql-hackers by date

Next:From: Daniele VarrazzoDate: 2013-01-08 16:55:58
Subject: Re: PL/Python result object str handler
Previous:From: Andres FreundDate: 2013-01-08 15:27:01
Subject: Re: Extra XLOG in Checkpoint for StandbySnapshot

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group