Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Nico Williams <nico(at)cryptonector(dot)com>, Asim R P <apraveen(at)pivotal(dot)io>, Jimmy Yih <jyih(at)pivotal(dot)io>
Subject: Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Date: 2018-08-13 10:53:05
Message-ID: 61373517-e5d5-a718-75a4-a4022cc48db4@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/08/18 19:19, Heikki Linnakangas wrote:
> So, with this commit, the various SIGQUIT quickdie handlers are in a
> better shape. The regular backend's quickdie() handler still calls
> ereport(), which is not safe, but it's a step in the right direction.
> And we haven't addressed the original complaint, which was actually
> about startup_die(), not quickdie().
>
> What should we do about startup_die()? Ideas?

To recap: If a backend process receives a SIGTERM, or if the
authentication timeout expires, while the backend is reading the startup
packet, we call proc_exit(0) from the signal handler. proc_exit(0) is
clearly not async-safe, so if the process was busy modifying backend
state, for example doing a memory allocation, bad things can happen. In
the beginning of this thread, Jimmy demonstrated a self-deadlock, when
the backend was in the middle of proc_exit(), when the signal arrived,
and a nested call to proc_exit was made.

I think the proper fix here is to stop calling proc_exit(0) directly
from the startup_die() signal handler. Instead, let's install the
regular die() signal handler earlier, before reading the startup
process, and rely on the usual CHECK_FOR_INTERRUPTS() mechanism.

One annoying thing is the pg_getnameinfo_all() call, in
BackendInitialize(). If 'log_hostname' is enabled, it can perform a DNS
lookup, which can take a very long time. It would be nice to still be
able to interrupt that, but if we stop calling proc_exit() from the
signal handler, we won't. I don't think it's safe to interrupt it, like
we do today, but I wonder if the cure is worse than the disease.

We revamped the signal handling in backend processes in PostgreSQL 9.5,
so I'm inclined to only backpatch this to 9.5. In 9.3 and 9.4, let's
just add a '!proc_exit_in_progress' check to the signal handler, to
prevent the nested proc_eixt() call that Jimmy ran into from happening.
It won't fix the general problem, but since we haven't heard any other
reports about this, I think that's the right amount of effort for 9.3
and 9.4.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-08-13 10:57:05 Re: Get funcid when create function
Previous Message Andres Freund 2018-08-13 09:56:16 Re: Temporary tables prevent autovacuum, leading to XID wraparound