On Mon, Jul 23, 2012 at 8:16 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Compiling 3a0e4d with -DCLOBBER_CACHE_ALWAYS causes initdb to segfault.
> It looks like the tuple was allocated in the EventTriggerCacheContext,
> and before the tuple was freed, another invalidation event came along
> and wiped out that context.
> It seems like the cleanest fix would be to explicitly allocate things in
> the EventTriggerCacheContext, rather than switching the context.
> However, it's not 100% clear to me that this is really a problem with
> the event trigger patch itself. RelationBuildDesc is doing an allocation
> in whatever the caller's context is, and freeing it. But in the
> meantime, it does it's own catalog accesses and otherwise executes quite
> a lot of code, and assumes that the memory context will stay intact the
> entire time. Maybe that's a reasonable assumption, but as seen here, it
> leaves room for error.
Yeah. Essentially, BuildRelationDesc wants the caller's memory
context to be short-lived (so that any leaks go away) but not so
short-lived that the tuple can get clobbered before the function gets
done using it. It's worked up until now but it seems kinda fragile.
However, even if we fixed that, I don't think it would fix this
problem completely, because BuildEventTriggerCache() is itself
unprepared for the possibility of a cache reset in the middle of a
After some thought, I think the right fix is to add a three-valued
flag indicating whether the cache is (1) stale (the starting state),
(2) rebuild started, or (3) valid.
(1) At the beginning of BuildEventTriggerCache, we unconditionally set
the state to "rebuild started". At the end, if the state is still
"rebuild started", we set it to "valid".
(2) InvalidateEventCacheCallback should blow the cache away only if
the current state is something other than "rebuild started". It
should then unconditionally set the flag to "stale".
(3) EventCacheLookup should rebuild the cache whenever the state is not "valid".
That way, if an invalidation event arrives in the middle of a rebuild,
we finish the rebuild without blowing up, but leave the cache marked
as stale, so that the next lookup will rebuild it again. The "rebuild
started" state could exist even when no rebuild is in progress if the
rebuild errors out, but it doesn't really matter; the next rebuild
will fix it.
Anyone see a hole in that, or have better ideas?
> While looking at it, I observe that no context is passed to
> hash_create(), and no hash_destroy() is called. Unless I'm missing
> something, that's a leak in TopMemoryContext.
I've pushed a fix for this; thanks for catching it.
The Enterprise PostgreSQL Company
In response to
pgsql-bugs by date
|Next:||From: Todd Kover||Date: 2012-08-08 00:10:25|
|Subject: renaming+recreating table w/default sequence causes dependency seq issue|
|Previous:||From: Tom Lane||Date: 2012-08-07 19:02:48|
|Subject: Re: BUG #7483: uuid-ossp does not compile on OS X 10.8|