Re: Checkpoint not retrying failed fsync?

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Subject: Re: Checkpoint not retrying failed fsync?
Date: 2018-06-12 03:31:07
Message-ID: CAEepm=21AfdkGTwaaWfde0J0u0ggmbP6TOO3pNB8tX8uSOTZ2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 8, 2018 at 9:17 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> New patch attached.

For Linux users (read: almost all users) this patch on its own is a
bit like rearranging the deck chairs on the Titanic. Because retrying
on failure is useless, among other problems, Craig and Andres are
working on converting IO errors into PANICs and overhauling the
request queue[1].

I was about to mark this patch "rejected" and forget about it, since
Craig's patch makes it redundant. But then I noticed that Craig's
patch doesn't actually remove the retry behaviour completely: it
promotes only EIO and ENOSPC to PANIC. If you somehow manage to get
any other errno from fsync(), the checkpoint will fail and you'll be
exposed to this bug: PostgreSQL forgets that the segment was dirty, so
the next checkpoint might fsync nothing at all and then bogusly report
success. So, I think either Craig's patch should PANIC on *any* error
from fsync(), or we should commit this patch too so that we remember
which segments are dirty next time around.

[1] https://www.postgresql.org/message-id/flat/CAMsr%2BYFPeKVaQ57PwHqmRNjPCPABsdbV%3DL85he2dVBcr6yS1mA%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-12 03:38:23 Re: BUG #15237: I got "ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"
Previous Message Craig Ringer 2018-06-12 03:28:58 Re: Does logical replication supports cross platform servers?