Re: EINTR in ftruncate()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: EINTR in ftruncate()
Date: 2022-07-06 20:56:33
Message-ID: CA+hUKGKS2Radu-1Ewhe1-LEj19C-3XAQ7wnkQMb4e9E9q9ZXSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 7, 2022 at 8:39 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-07-06 21:29:41 +0200, Alvaro Herrera wrote:
> > On 2022-Jul-05, Andres Freund wrote:
> >
> > > I think we'd be better off disabling at least some signals during
> > > dsm_impl_posix_resize(). I'm afraid we'll otherwise just find another
> > > variation of these problems. I haven't checked the source of ftruncate, but
> > > what Thomas dug up for fallocate makes it pretty clear that our current
> > > approach of just retrying again and again isn't good enough. It's a bit more
> > > obvious that it's a problem for fallocate, but I don't think it's worth having
> > > different solutions for the two.
> >
> > So what if we move the retry loop one level up? As in the attached.
> > Here, if we get EINTR then we retry both syscalls.
>
> Doesn't really seem to address the problem to me. posix_fallocate()
> takes some time (~1s for 3GB roughly), so if we signal at a higher rate,
> we'll just get stuck.
>
> I hacked a bit on a test program from Thomas, and it's pretty clearly
> that with a 5ms timer interval you'll pretty much not make
> progress. It's much easier to get fallocate() to get interrupted than
> ftruncate(), but the latter gets interrupted e.g. when you do a strace
> in the "wrong" moment (afaics SIGSTOP/SIGCONT trigger EINTR in
> situations that are retried otherwise).
>
> So I think we need: 1) block most signals, 2) a retry loop *without*
> interrupt checks.

Yeah. I was also wondering about wrapping the whole function in
PG_SETMASK(&BlockSig), PG_SETMASK(&UnBlockSig), but also leaving the
while (rc == EINTR) loop there (without the check for *Pending
variables), only because otherwise when you attach a debugger and
continue you'll get a spurious EINTR and it'll interfere with program
execution. All blockable signals would be blocked *except* SIGQUIT,
which means that fast shutdown/crash will still work. It seems nice
to leave that way to interrupt it without resorting to SIGKILL.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-06 21:03:23 Re: EINTR in ftruncate()
Previous Message Tom Lane 2022-07-06 20:46:27 Re: automatically generating node support functions