Re: BUG #16199: pg_restore stuck on interrupts

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: git(at)rmr(dot)ninja
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16199: pg_restore stuck on interrupts
Date: 2020-01-08 17:13:35
Message-ID: 9845.1578503615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> We are seeing stuck pg_restore processes in several of our CI servers, both
> with PG10 (10.2) and PG11 (11.5).

You didn't actually say, but you must be interrupting parallel restores
with SIGINT or the like?

> I have a some extra processes with the same issue (7 full stacks out of 20,
> the others are garbage) and, from what I see, they all have in common that
> the process has received a signal while it was doing a memory operation,
> either a malloc or a free:

Yeah. Ugh :-(

> I think if would be safer to use a similar approach to other processes, that
> is use the handler to only enable a global flag and check that in the main
> loop, but I'm having a hard time locating what the proper place to check the
> flag would be.

I think the odds of that being an improvement are minimal --- you'd be
trading a risk of failure during exit for a risk of not exiting (in any
timely fashion) in the first place.

sigTermHandler tries to be safe to run in a signal context, but I'm
afraid we didn't think hard about what exit() might call. The way
I'd be inclined to fix this is to call _exit() instead of exit(),
and the heck with what any atexit handlers think. Can you try that
and see if it improves matters for you?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Raúl Marín 2020-01-08 17:53:06 Re: BUG #16199: pg_restore stuck on interrupts
Previous Message PG Bug reporting form 2020-01-08 16:17:57 BUG #16199: pg_restore stuck on interrupts