Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-08-09 05:26:39
Message-ID: CA+hUKGKPtfOW5k+O4O3_ZEKBBCitm90e3bkOU_GGcKXe5QACHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2022 at 4:28 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> On Tue, Aug 09, 2022 at 09:29:16AM +0530, Bharath Rupireddy wrote:
> > On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great.
> >> But should the replacement behavior be elog-LOG-and-press-on,
> >> or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any
> >> hurry to believe that the latter is more appropriate without some
> >> analysis of what the callers are doing.
> >>
> >> The bottom line here is that I'm distrustful of behavioral changes
> >> introduced to simplify refactoring rather than to solve a live
> >> problem.
> >
> > +1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the
> > checkpoint operation. Because the checkpoint is more important than

The changes were not from elog-LOG to elog-ERROR, they were changes
from silently eating errors, but I take your (plural) general point.

> > why a single snapshot file (out thousands or even million files) isn't
> > removed at that moment. Also, I originally proposed to change
> > elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink()
> > failures for the same reason.
>
> This was my initial instinct as well, but this thread has received
> contradictory feedback during the months since.

OK, so there aren't many places in 0001 that error out where we
previously did not.

For the one in CheckPointLogicalRewriteHeap(), if it fails, you don't
just want to skip -- it might be in the range that we need to fsync().
So what if we just deleted that get_dirent_type() != PGFILETYPE_REG
check altogether? Depending on the name range check, we'd either
attempt to unlink() and LOG if that fails, or we'd attempt to
open()-or-ERROR and then fsync()-or-PANIC. What functionality have we
lost without the DT_REG check? Well, now if someone ill-advisedly
puts an important character device, socket, symlink (ie any kind of
non-DT_REG) into that directory with a name conforming to the
unlinkable range, we'll try to unlink it and possibly succeed, where
previously we'd have skipped, and if it's in the checkpointable range,
we'd try to fsync() it and likely fail, which is good because this is
a supposed to be a checkpoint.

That's already what happens if lstat() fails in master. We fall back
to treating it like a DT_REG anyway:

if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
continue;

For the one in CheckSnapBuild(), it's similar but unlink only.

For the one in StartupReplicationSlots(), you could possibly take the
same line: if it's not a directory, we'll try an rmdir() it anyway and
then emit a WARNING if that fails. Again, that's exactly what master
would do if that lstat() failed.

In other words, if we're going to LOG and press on, maybe we should
just press on anyway. Would the error message be any less
informative? Would the risk of unlinking a character device that you
accidentally stored in
/clusters/pg16-main/pgdata/pg_logical/mappings/map-1234-5678 be a
problem for anyone?

Separately from that topic, it would be nice to have a comment in each
place where we tolerate unlink() failure to explain why that is
harmless except for wasted disk.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-08-09 05:39:27 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Tom Lane 2022-08-09 05:12:53 Re: optimize lookups in snapshot [sub]xip arrays