Re: EINTR in ftruncate()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Chris Travers <chris(dot)travers(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: EINTR in ftruncate()
Date: 2022-07-01 22:17:22
Message-ID: 20220701221722.os4ktbe5pnciqguv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-07-02 09:52:33 +1200, Thomas Munro wrote:
> On Sat, Jul 2, 2022 at 9:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > Chris, do you have any additional details about the machine that lead to this
> > change? OS version, whether it might have been swapping, etc?
> >
> > I wonder if what happened is that posix_fallocate() used glibc's fallback
> > implementation because the kernel was old enough to not support fallocate()
> > for tmpfs. Looks like support for fallocate() for tmpfs was added in 3.5
> > ([1]). So e.g. a rhel 6 wouldn't have had that.
>
> With a quick test program on my Linux 5.10 kernel I see that an
> SA_RESTART signal handler definitely causes posix_fallocate() to
> return EINTR (can post trivial program).
>
> A drive-by look at the current/modern kernel source supports this:
> shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
> to be the Linux-y way to say you want EINTR or restart as
> appropriate?), and it also undoes all partial progress too (not too
> surprising), which would explain why a perfectly timed machine gun
> stream of signals from our recovery conflict system can make an
> fallocate retry loop never terminate, for large enough sizes.

Yea :(

And even if we fix recovery to not do douse other processes in signals quite
that badly, there are plenty other sources of signals that can arrive at a
steady clip. So I think we need to do something to defuse this another way.

Ideas:

1) do the fallocate in smaller chunks, thereby making it much more likely to
complete between two signal deliveries
2) block signals while calling posix_fallocate(). That won't work for
everything (e.g. rapid SIGSTOP/SIGCONT), but that's not something we'd send
ourselves, so whatever.
3) 1+2
4) ?

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-01 22:19:28 Re: Time to remove unparenthesized syntax for VACUUM?
Previous Message Nathan Bossart 2022-07-01 22:13:16 Re: Time to remove unparenthesized syntax for VACUUM?