Re: fairywren failures

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: fairywren failures
Date: 2019-10-03 16:17:52
Message-ID: 20191003161752.ylp3ppdry2onhiua@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-10-03 08:23:49 -0700, Andres Freund wrote:
> On 2019-10-03 08:18:42 -0700, Andres Freund wrote:
> > This is around where an error is thrown:
> > -- badly formatted interval
> > INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted interval');
> > -ERROR: invalid input syntax for type interval: "badly formatted interval"
> > -LINE 1: INSERT INTO INTERVAL_TBL (f1) VALUES ('badly formatted inter...
> > - ^
> >
> > and the error is stack related. So I suspect that setjmp/longjmp might
> > be to blame here, and somehow don't save/restore the stack into a proper
> > state. I don't know enough about mingw/msys/windows to know whether that
> > uses a self-written setjmp or relies on the MS implementation.
> >
> > If you could gather a backtrace it might help us. It's possible that the
> > stack is "just" misaligned or something, we had problems with that
> > before (IIRC valgrind didn't always align stacks correctly for processes
> > that forked from within a signal handler, which then crashed when using
> > instructions with alignment requirements, but only sometimes, because
> > the stack coiuld be aligned).
>
> It seems we're not the only ones hitting this:
> https://rt.perl.org/Public/Bug/Display.html?id=133603
>
> Doesn't look like they've really narrowed it down that much yet.

A few notes:

* As an experiment, it could be worthwhile to try to redefine
sigsetjmp/longjmp/sigjmp_buf with what
https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html
provides, it's apparently a separate implementation from MS crt one.

* Arguably
"Do not use longjmp to transfer control from a callback routine
invoked directly or indirectly by Windows code."
and
"Do not use longjmp to transfer control out of an interrupt-handling
routine unless the interrupt is caused by a floating-point
exception. In this case, a program may return from an interrupt
handler via longjmp if it first reinitializes the floating-point math
package by calling _fpreset."

from https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/longjmp?view=vs-2019

might be violated by our signal signal emulation on windows. But I've
not looked into that in detail.

* Any chance you could get the pre-processed source for postgres.c or
such? I'm kinda wondering if the definition of setjmp() that we get
includes the returns_twice attribute that gcc wants to see, and
whether we're picking up the mingw version of longjmp, or the windows
one.

https://sourceforge.net/p/mingw-w64/mingw-w64/ci/844cb490ab2cc32ac3df5914700564b2e40739d8/tree/mingw-w64-headers/crt/setjmp.h#l31

* It's certainly curious that the failures so far only have happended as
part of pg_upgradeCheck, rather than the plain regression tests.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-10-03 16:20:14 Re: Improving on MAX_CONVERSION_GROWTH
Previous Message Tom Lane 2019-10-03 16:12:40 Re: Improving on MAX_CONVERSION_GROWTH