Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, Nico Williams <nico(at)cryptonector(dot)com>
Cc: Asim R P <apraveen(at)pivotal(dot)io>, Jimmy Yih <jyih(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Date: 2018-07-20 08:53:32
Message-ID: 771a2a12-dcc3-a64e-f54d-b6bbc030d2f8@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19/07/18 23:13, Andres Freund wrote:
> On 2018-07-19 14:54:52 -0500, Nico Williams wrote:
>> On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote:
>>> On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote:
>>>> Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I
>>>> agree we should just _exit(2). All we want to do is to exit the process
>>>> immediately.
>>>
>>> Indeed.
>>>
>>>> bgworker_quickdie() makes some effort to block SIGQUIT during the exit()
>>>> processing, but that doesn't solve the whole problem. The process could've
>>>> been in the middle of a malloc/free when the signal arrived, for example.
>>>> exit() is simply not safe to call from a signal handler.
>>>
>>> Yea. I really don't understand why we have't been able to agree on this
>>> for *years* now.
>>
>> I mean, only calling async-signal-safe functions from signal handlers is
>> a critical POSIX requirement, and exit(3) is NOT async-signal-safe.
>
> There's a few standard requirements that aren't particularly relevant in
> practice (including some functions not being listed as signal
> safe). Just arguing from that isn't really helpful. But there's plenty
> hard evidence, including a few messages upthread!, of it being
> practically problematic. Just looking at the reasoning at why exit (and
> malloc) aren't signal safe however, makes it clear that this particular
> restriction should be adhered to, however.

ISTM that no-one has any great ideas on what to do about the ereport()
in quickdie(). But I think we have consensus on replacing the exit(2)
calls with _exit(2). If we do just that, it would be better than the
status quo, even if it doesn't completely fix the problem. This would
prevent the case that Asim reported, for starters.

Any objections to the attached?

With _exit(), I think we wouldn't really need the
"PG_SETMASK(&BlockSig);" calls in the aux process *_quickdie handlers
that don't do anything else than call _exit(). But I didn't dare to
remove them yet.

- Heikki

Attachment Content-Type Size
use-_exit-rather-than-exit-in-quickdie-1.patch text/x-patch 13.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-07-20 08:57:26 Re: de-deduplicate code in DML execution hooks in postgres_fdw
Previous Message AYahorau 2018-07-20 08:43:31 Adding TCP_USER_TIMEOUT support for libpq/psqlodbc