RE: [HACKERS] Sure enough, SI buffer overrun is broken

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: RE: [HACKERS] Sure enough, SI buffer overrun is broken
Date: 2000-01-30 03:14:55
Message-ID: NDBBIJLOILGIKBGDINDFKEIECCAA.Inoue@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> >> I built the current sources with MAXNUMMESSAGES set to 32 in
> >> src/include/storage/sinvaladt.h. The regular regress tests
> >> run OK, with just a few NOTICEs about 'cache state reset'
> >> and 'SI buffer overflow' inserted in the normal outputs
> >> (as you'd expect, if SI overrun occurs).
> >>
> >> However, the parallel tests crash spectacularly, with weird errors
> >> and Assert() coredumps.
>
> With these two changes in place, the parallel regress tests seem much
> more stable. There is still a big problem though, which is that the
> "portals" regress test sometimes fails. I traced this far enough to
> discover that the code is trying to use a TupleDesc that it's stored in
> the ScanTupleSlot of a plan, and this TupleDesc is no longer valid ---
> presumably the relcache entry it was gotten from has been flushed and
> rebuilt, leaving the plan with a dangling TupleDesc pointer. Ugh.
>

Oh great,I've also doubted relcache entry rebuild but wasn't able to
trace so far.

> I do not think it is very practical to try to change all of the places
> that assume that they can save pointers to the TupleDesc of a relcache
> entry. Instead I am thinking about solving the problem inside the
> relcache, as follows:
>
> During a relcache entry rebuild, do not simply free and reconstruct
> the TupleDesc. Instead, read the catalogs to build a new TupleDesc,
> and compare this one to the old one. If they are the same, free the
> new one instead (leaving the old one in place, and hence stored pointers
> to it are still valid). If they are not the same, then elog(ERROR).
>
> (elog may sound overly paranoid, but this condition indicates that the
> table's definition has actually changed since we grabbed the refcount on
> it --- remember we wouldn't be doing this at all if the relcache entry had
> zero refcount --- and therefore all kinds of derived information such as
> plans may be wrong. Pressing ahead will probably lead to crash.)
>

Sounds reasonable.

> We may need to do the same for any other substructures of the relcache
> entry that are visible from outside relcache.c.
>
> I know this sounds pretty grotty, but we are already doing it for the
> relcache entry itself --- rebuild re-uses the same physical entry
> rather than deleting and reallocating it, to ensure that pointers
> to an open Relation stay valid over an SI flush. We need to extend
> the same guarantee to the substructure of the relcache entry.
>
> Unless someone has a better idea, I'll work on that.
>

Agreed.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-01-30 03:18:09 Re: [HACKERS] freefuncs.c is never called from anywhere!?
Previous Message Hiroshi Inoue 2000-01-30 03:14:53 RE: ImmediateSharedRelationCacheInvalidate considered harmful