Re: gharial segfaulting on REL_12_STABLE only

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gharial segfaulting on REL_12_STABLE only
Date: 2019-08-27 01:48:01
Message-ID: 3067.1566870481@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> This is apparently an EDB-owned machine but I have no access to it
> currently (I could ask if necessary). For some reason it's been
> failing for a week, but only on REL_12_STABLE, with this in the log:

Yeah, I've been puzzling over that to little avail.

> It's hard to see how cdc8d371e2, the only non-doc commit listed on the
> first failure, could have anything to do with that.

Exactly :-(. It seems completely reproducible since then, but how
could that have triggered a failure over here? And why only in this
branch? The identical patch went into HEAD.

> 2019-08-20 04:31:48.886 MDT [13421:4] LOG: server process (PID 13871)
> was terminated by signal 11: unrecognized signal
> 2019-08-20 04:31:48.886 MDT [13421:5] DETAIL: Failed process was
> running: SET default_table_access_method = '';

> Apparently HPUX's sys_siglist doesn't recognise that most popular of
> signals, 11, but by googling I see that it has its traditional meaning
> there.

HPUX hasn't *got* sys_siglist, nor strsignal() which is what we're
actually relying on these days (cf. pgstrsignal.c). I was puzzled
by that too to start with, though. I wonder if we shouldn't rearrange
pg_strsignal so that the message in the !HAVE_STRSIGNAL case is
something like "signal names not available on this platform" rather
than something that looks like we should've recognized it and didn't.

> 2019-08-20 04:31:22.422 MDT [13871:34] pg_regress/create_am LOG:
> statement: SET default_table_access_method = '';

> Perhaps it was really running the next statement.

Hard to see how, because this should have reported

ERROR: invalid value for parameter "default_table_access_method": ""
DETAIL: default_table_access_method cannot be empty.

but it didn't get that far. It seems like it must have died either
in the (utterly trivial) check that leads to the above-quoted
complaint, or somewhere in the ereport mechanism. Neither theory
seems very credible.

The seeming action-at-a-distance nature of the failure has me
speculating about compiler or linker bugs, but I dislike
jumping to that type of conclusion without hard evidence.

A stack trace would likely be really useful right about now.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-27 01:54:46 Re: old_snapshot_threshold vs indexes
Previous Message Thomas Munro 2019-08-27 00:51:43 gharial segfaulting on REL_12_STABLE only