Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 15:06:52
Message-ID: 3393.1177427212@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I briefly went through all callers of hash_seq_init. The only place
> where we explicitly rely on being able to add entries to a hash table
> while scanning it is in tbm_lossify. There's more complex loops in
> portalmem.c and relcache.c, which I think are safe, but would need to
> look closer. There's also the pg_prepared_statement
> set-returning-function that keeps a scan open across calls, which seems
> error-prone.

The pending-fsync stuff in md.c is also expecting to be able to add
entries during a scan.

I don't think we can go in the direction of forbidding insertions during
a scan --- as the case at hand shows, it's just not always obvious that
that could happen, and finding/fixing such a problem is nigh impossible.
(We were darn fortunate to be able to reproduce this one.) Plus we have
a couple of places where it's really necessary to be able to do it,
anyway.

The only answer I can see that seems reasonably robust is to change
dynahash.c so that it tracks whether any seq_search scans are open on a
hashtable, and doesn't carry out any splits while one is. This wouldn't
cost anything noticeable in performance, assuming that not very many
splits are postponed. The PITA aspect of it is that we'd need to add
bookkeeping mechanisms to ensure that the count of active scans gets
cleaned up on error exit. It's not like we've not got lots of those,
though.

Possibly we could simplify matters a bit by not worrying about cleaning
up leaked counts at subtransaction abort, ie, the list of open scans
would only get forced to empty at top transaction end. This carries a
slightly higher risk of meaningful performance degradation, but in
practice I doubt it's a big problem. If we agreed that then we'd not
need ResourceOwner support --- it could be handled like LWLock counts.

pg_prepared_statement is simply broken --- what if the next-to-scan
statement is deleted between calls? It'll have to be changed.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message William Lawrance 2007-04-24 16:29:45 Re: [HACKERS] BUG #3244: problem with PREPARE
Previous Message Bruce Momjian 2007-04-24 14:10:03 Re: [HACKERS] BUG #3244: problem with PREPARE

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2007-04-24 15:31:42 Re: [HACKERS] Full page writes improvement, code update
Previous Message Marko Kreen 2007-04-24 15:04:19 Re: RESET command seems pretty disjointed now

Browse pgsql-patches by date

  From Date Subject
Next Message Josh Berkus 2007-04-24 15:31:42 Re: [HACKERS] Full page writes improvement, code update
Previous Message Heikki Linnakangas 2007-04-24 12:50:17 Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect