Re: Use of signal-unsafe functions from signal handlers

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Use of signal-unsafe functions from signal handlers
Date: 2022-05-24 12:10:36
Message-ID: CA+14426Zpdi6=jr9Z+C_tzS36pVa_+BbCHSeriqd=sLw4X6_iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:

> Hi,
>
> On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
> >
> > Typically, signal-unsafe functions should not be called from signal
> > handlers. In particular, calling malloc() directly or indirectly can
> cause
> > deadlocks, making PostgreSQL unresponsive to signals.
> >
> > Unless I am missing something, bgworker_die uses ereport, which
> indirectly
> > calls printf-like functions, which are not signal-safe since they use
> > malloc(). In rare cases, this can lead to deadlocks with stacks that look
> > like this (from https://github.com/timescale/timescaledb/issues/4200):
> >
> > #0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
> > target:/lib/x86_64-linux-gnu/libc.so.6
> > [...]
> > #3 malloc (size=53)
> > [...]
> > #7 0x000055b9212235b1 in errmsg ()
> > #8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> > /build/timescaledb/src/bgw/scheduler.c:841
> > #9 <signal handler called>
> > [...]
> > #13 free (ptr=<optimized out>)
> > #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> > target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > [...]
>
> As far as I can see the problem comes from your handle_sigterm:
>
> static void handle_sigterm(SIGNAL_ARGS)
> {
> /*
> * do not use a level >= ERROR because we don't want to exit here
> but
> * rather only during CHECK_FOR_INTERRUPTS
> */
> ereport(LOG,
> (errcode(ERRCODE_ADMIN_SHUTDOWN),
> errmsg("terminating TimescaleDB job scheduler due
> to administrator command")));
> die(postgres_signal_arg);
> }
>
> As mentioned in the topmost comment of elog.c, there's an escape hatch for
> out of memory situations, to make sure that a small enough message can be
> displayed without trying to allocate memory. But this is only for ERROR or
> higher levels. Using an ereport(LOG, ...) level in a sigterm handler
> indeed
> isn't safe.
>

Yes, and we have already fixed this in the TimescaleDB code, so this is not
an issue for us (https://github.com/timescale/timescaledb/pull/4323). (We
actually replace the bgworker_die with just die as signal handler.)

However, bgworker_die()---which is part of PostgreSQL and is the default
signal handler for background workers---is using ereport() and I think this
should be a problem for all error levels since the problem is in the usage
of the formatting function to format the error message, not where you write
it. This would mean that any existing background workers would have a
window for triggering this issue on shutdown.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Julien Rouhaud 2022-05-24 12:37:39 Re: Use of signal-unsafe functions from signal handlers
Previous Message PG Bug reporting form 2022-05-24 11:04:18 BUG #17494: High demand for displacement sort