Re: BUG #5412: test case produced, possible race condition.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Rusty Conover <rconover(at)infogears(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5412: test case produced, possible race condition.
Date: 2010-04-14 15:32:16
Message-ID: 25124.1271259136@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> Why would this patch fix anything? It doesn't change the lock status.

I have not been able to reproduce the crash using Rusty's script on my
own machine, but after contemplating his stack trace for awhile I have a
theory about what is happening. I think that while we are building a
new relation entry (the RelationBuildDesc call in RelationClearRelation)
for a locally-created relation, we receive an sinval reset event caused
by sinval queue overflow. (That could only happen with a lot of
concurrent catalog update activity, which is why there's a significant
number of concurrent "job1" clients needed to provoke the problem.)
The sinval reset will be serviced by RelationCacheInvalidate, which will
blow away any relcache entries with refcount zero, including the one
that the outer instance of RelationClearRelation is trying to rebuild.
So when control returns the next thing that happens is we try to do the
equalTupleDescs() comparison against a trashed pointer, as seen in the
stack trace.

This behavior is new in 8.4.3; before that RelationClearRelation
temporarily unhooked the target rel from the relcache hash table,
so it wouldn't be found by RelationCacheInvalidate. So that explains
why Rusty's app worked before.

In short, then, Heikki's fix is good, although it desperately needs
some comment updates: there's effectively an API change happening here,
because RelationClearRelation's contract with its caller is not the
same as before. I'll clean it up a bit and apply. It will need to
go into all the branches this patch went into:
http://archives.postgresql.org/pgsql-committers/2010-01/msg00186.php

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2010-04-14 16:11:09 Re: Bug in CREATE FUNCTION with character type (CONFIRMED BUG)
Previous Message Kevin J Bluck 2010-04-14 15:27:22 Re: Bug in CREATE FUNCTION with character type (CONFIRMED BUG)

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-04-14 15:46:44 Re: Timezone matching script (win32)
Previous Message Magnus Hagander 2010-04-14 15:29:45 Re: Timezone matching script (win32)