Re: checkpointer: PANIC: could not fsync file: No such file or directory

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer: PANIC: could not fsync file: No such file or directory
Date: 2019-11-28 21:50:36
Message-ID: CA+hUKGLw0RnKguqVvyWsJJR+2KHR9AyBqhbOuzdHjq9_XPokzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 29, 2019 at 3:13 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Wed, Nov 27, 2019 at 7:53 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > 2019-11-26 23:41:50.009-05 | could not fsync file "pg_tblspc/16401/PG_12_201909212/16460/973123799.10": No such file or directory
>
> I managed to reproduce this (see below). I think I know what the
> problem is: mdsyncfiletag() uses _mdfd_getseg() to open the segment to
> be fsync'd, but that function opens all segments up to the one you
> requested, so if a lower-numbered segment has already been unlinked,
> it can fail. Usually that's unlikely because it's hard to get the
> request queue to fill up and therefore hard to split up the cancel
> requests for all the segments for a relation, but your workload and
> the repro below do it. In fact, the path it shows in the error
> message is not even the problem file, that's the one it really wanted,
> but first it was trying to open lower-numbered ones. I can see a
> couple of solutions to the problem (unlink in reverse order, send all
> the forget messages first before unlinking anything, or go back to
> using a single atomic "forget everything for this rel" message instead
> of per-segment messages), but I'll have to think more about that
> tomorrow.

Here is a patch that fixes the problem by sending all the
SYNC_FORGET_REQUEST messages up front.

Attachment Content-Type Size
0001-Fix-ordering-bug-in-mdunlinkfork.patch application/octet-stream 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-11-28 22:14:50 Re: checkpointer: PANIC: could not fsync file: No such file or directory
Previous Message Tom Lane 2019-11-28 21:25:54 Re: Do XID sequences need to be contiguous?