Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Nico Williams <nico(at)cryptonector(dot)com>, Asim R P <apraveen(at)pivotal(dot)io>, Jimmy Yih <jyih(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Date: 2018-07-20 15:03:23
Message-ID: 20180720150323.v2uschqaedovm7ep@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-07-20 11:53:32 +0300, Heikki Linnakangas wrote:
> ISTM that no-one has any great ideas on what to do about the ereport() in
> quickdie().

I think we actually have some decent ideas how to make that less
problematic in a number of cases. Avoid translation (thereby avoiding
direct malloc()) and rely on the fact that the error message evaluation
happens in a memory context with pre-allocated memory. That still
leaves openssl, but is better than nothing. If we wanted, we
additionally could force ssl allocations to happen through another
context with preallocated memory.

> But I think we have consensus on replacing the exit(2) calls
> with _exit(2). If we do just that, it would be better than the status quo,
> even if it doesn't completely fix the problem. This would prevent the case
> that Asim reported, for starters.

Right. You're planning to backpatch?

> With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);"
> calls in the aux process *_quickdie handlers that don't do anything else
> than call _exit(). But I didn't dare to remove them yet.

I think we should remove it at least in master.

> diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
> index 960d3de204..010d53a6ce 100644
> --- a/src/backend/postmaster/bgwriter.c
> +++ b/src/backend/postmaster/bgwriter.c
> @@ -404,22 +404,21 @@ bg_quickdie(SIGNAL_ARGS)
> /*
> * We DO NOT want to run proc_exit() callbacks -- we're here because
> * shared memory may be corrupted, so we don't want to try to clean up our
> - * transaction. Just nail the windows shut and get out of town. Now that
> - * there's an atexit callback to prevent third-party code from breaking
> - * things by calling exit() directly, we have to reset the callbacks
> - * explicitly to make this work as intended.
> - */
> - on_exit_reset();
> -
> - /*
> - * Note we do exit(2) not exit(0). This is to force the postmaster into a
> - * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
> - * backend. This is necessary precisely because we don't clean up our
> - * shared memory state. (The "dead man switch" mechanism in pmsignal.c
> - * should ensure the postmaster sees this as a crash, too, but no harm in
> - * being doubly sure.)
> + * transaction. Just nail the windows shut and get out of town.
> + *
> + * There's an atexit callback to prevent third-party code from breaking
> + * things by calling exit() directly. We don't want to trigger that, so
> + * we use _exit(), which doesn't run atexit callbacks, rather than exit().
> + * And exit() wouldn't be safe to run from a signal handler, anyway.

It's much less the exit() that's unsafe, than the callbacks themselves,
right? Perhaps just restate that we wouldn't want to trigger atexit
processing due to signal safety?

> + * Note we use _exit(2) not _exit(0). This is to force the postmaster
> + * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
> + * random backend. This is necessary precisely because we don't clean up
> + * our shared memory state. (The "dead man switch" mechanism in
> + * pmsignal.c should ensure the postmaster sees this as a crash, too, but
> + * no harm in being doubly sure.)
> */
> - exit(2);
> + _exit(2);
> }

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-20 15:09:13 Re: Non-portable shell code in pg_upgrade tap tests
Previous Message Andres Freund 2018-07-20 14:58:36 Re: [HACKERS] logical decoding of two-phase transactions