Re: Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: jimmy <mpokky(at)126(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker
Date: 2018-08-22 04:47:30
Message-ID: CAEepm=30uOeesrmZWBj6zFh-E2hByJseyoM2ZtUS2r0E5G9zyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 22, 2018 at 2:54 PM, jimmy <mpokky(at)126(dot)com> wrote:
> This is the debug log below. Is it useful. Thank you.

That's not showing the path that reaches the error. If it's happening
in a parallel worker, that'll probably be tricky to catch with a
breakpoint. Are you able to recompile PostgreSQL? If you could do
that after changing all cases of elog(ERROR, "invalid cache ID: %d",
cacheId) to PANIC instead of ERROR, and then start it with ulimit -c
unlimited, you might get a core file that you can load into a debugger
to see how we reached it.

It's a strange error. I don't think it can be coming from these
places in inval.c:

if (cacheid < 0 || cacheid >= SysCacheSize)
elog(ERROR, "invalid cache ID: %d", cacheid);

... because we can see that it's 42 (PROCNAMEARGSNSP, a valid cache
ID), and SysCacheSize is a compile-time constant greater than 42. So
it must be coming from one of the places in syscache.c that look like
this:

if (cacheId < 0 || cacheId >= SysCacheSize ||
!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "invalid cache ID: %d", cacheId);

Since InitCatalogCache() puts a non-NULL pointer into every index from
0 to SysCacheSize - 1 without gaps (or it errors out if it fails while
trying), it seems like either InitCatalogCache() didn't run, or
SysCache[42] has later been overwritten with NULL? I wondered if
there is some way for a parallel worker to reach shared invalidation
message processing code before the InitCatalogCache() has run, but
that doesn't seem to be an issue: SysCacheInvalidate() quietly
tolerates that.

I wonder how we could reach one of SearchSysCache(PROCNAMEARGSNSP,
...), SysCacheGetAttr(PROCNAMEARGSNSP, ...),
GetSysCacheHashValue(PROCNAMEARGSNSP, ...),
SearchSysCacheList(PROCNAMEARGSNSP, ...) before InitCatalogCache() has
finished? The answer probably involves oracle_fdw.

Ahh, how about this line here:

https://github.com/laurenz/oracle_fdw/blob/master/oracle_fdw.c#L6237

catlist = SearchSysCacheList2(
PROCNAMEARGSNSP,
CStringGetDatum("geometry_recv"),
PointerGetDatum(buildoidvector(argtypes, argcount)));

I don't immediately see how that can be reached before
InitCatalogCache() has run, though.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-08-22 05:08:36 BUG #15345: pg_upgrade from 9.6.10 to 10.5 fails due to function call in index definition
Previous Message jimmy 2018-08-22 02:54:27 Re:Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker