Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit
Date: 2019-06-18 19:18:52
Message-ID: 20190618191852.aqmw5dt3milodkqd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-12-17 15:35:01 -0800, Andres Freund wrote:
> On 2018-12-16 13:48:00 -0800, Andres Freund wrote:
> > On 2018-12-17 08:25:38 +1100, Thomas Munro wrote:
> > > On Mon, Dec 17, 2018 at 7:57 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > The interesting bit is that if I replace the _exit(2) in
> > > > bgworker_quickdie() with an exit(2) (i.e. processing atexit handlers),
> > > > or manully add an OPENSSL_cleanup() before the _exit(2), valgrind
> > > > doesn't find errors.
> > >
> > > Weird. Well I can see that there were bugs last year where OpenSSL
> > > failed to clean up its thread locals[1], and after they fixed that,
> > > cases where it bogusly cleaned up someone else's thread locals[2].
> > > Maybe there is some interference between pthread keys or something
> > > like that.
> > >
> > > [1] https://github.com/openssl/openssl/issues/3033
> > > [2] https://github.com/openssl/openssl/issues/3584
> >
> > What confuses the heck out of me is that it happens on _exit(). Those
> > issues ought to be only visible when doing exit(), no?
> >
> > I guess there's also a good argument to make that valgrind running it's
> > intercept in the _exit() case is a bit dubious (given that's going to be
> > used in cases where e.g. a signal handler might have interrupted a
> > malloc), but given the stacktraces here I don't think that can be the
> > cause.
>
> I've for now put --run-libc-freeres=no into skink's config. Locally that
> "fixes" the issue for me, but of course is not a proper solution. But I
> want to see whether that allows running all tests under valgrind.

Turns out to be caused by a glibc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=24476

The reason it only fails if ssl is enabled, and only after the openssl
randomness was integrated, is that openssl randomness initialization
creates a TLS variable, which glibc then frees accidentally (as it tries
to free something not initialized).

Thus this can be "worked around" by doing something like
shared_preload_libraries=pg_stat_statements, as dlopening a library
initializes the relevant state.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Li, Zheng 2019-06-18 20:22:30 Re: NOT IN subquery optimization
Previous Message Pavel Stehule 2019-06-18 18:29:12 Re: idea: log_statement_sample_rate - bottom limit for sampling