|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Thomas Munro <thomas(dot)munro(at)gmail(dot)com>|
|Subject:||Re: gharial segfaulting on REL_12_STABLE only|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> This is apparently an EDB-owned machine but I have no access to it
> currently (I could ask if necessary). For some reason it's been
> failing for a week, but only on REL_12_STABLE, with this in the log:
Yeah, I've been puzzling over that to little avail.
> It's hard to see how cdc8d371e2, the only non-doc commit listed on the
> first failure, could have anything to do with that.
Exactly :-(. It seems completely reproducible since then, but how
could that have triggered a failure over here? And why only in this
branch? The identical patch went into HEAD.
> 2019-08-20 04:31:48.886 MDT [13421:4] LOG: server process (PID 13871)
> was terminated by signal 11: unrecognized signal
> 2019-08-20 04:31:48.886 MDT [13421:5] DETAIL: Failed process was
> running: SET default_table_access_method = '';
> Apparently HPUX's sys_siglist doesn't recognise that most popular of
> signals, 11, but by googling I see that it has its traditional meaning
HPUX hasn't *got* sys_siglist, nor strsignal() which is what we're
actually relying on these days (cf. pgstrsignal.c). I was puzzled
by that too to start with, though. I wonder if we shouldn't rearrange
pg_strsignal so that the message in the !HAVE_STRSIGNAL case is
something like "signal names not available on this platform" rather
than something that looks like we should've recognized it and didn't.
> 2019-08-20 04:31:22.422 MDT [13871:34] pg_regress/create_am LOG:
> statement: SET default_table_access_method = '';
> Perhaps it was really running the next statement.
Hard to see how, because this should have reported
ERROR: invalid value for parameter "default_table_access_method": ""
DETAIL: default_table_access_method cannot be empty.
but it didn't get that far. It seems like it must have died either
in the (utterly trivial) check that leads to the above-quoted
complaint, or somewhere in the ereport mechanism. Neither theory
seems very credible.
The seeming action-at-a-distance nature of the failure has me
speculating about compiler or linker bugs, but I dislike
jumping to that type of conclusion without hard evidence.
A stack trace would likely be really useful right about now.
regards, tom lane
|Next Message||Tom Lane||2019-08-27 01:54:46||Re: old_snapshot_threshold vs indexes|
|Previous Message||Thomas Munro||2019-08-27 00:51:43||gharial segfaulting on REL_12_STABLE only|