Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-08-09 03:59:16
Message-ID: CALj2ACX3u2hvp=jiOZ20m0dKXxieNy=uhYfP6d1WTZh0pdso0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I have not tried to analyze the error-handling properties of 0001,
> >> but if it's being equally cavalier then it shouldn't be committed
> >> either.
>
> > 0001 does introduce new errors, as mentioned in the commit message, in
> > the form of elevel ERROR passed into get_dirent_type(), which might be
> > thrown if your OS has no d_type and lstat() fails (also if you asked
> > to follow symlinks, but in those cases errors were already thrown).
> > But in those cases, it seems at least a little fishy that we ignored
> > errors from the existing lstat().
>
> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great.
> But should the replacement behavior be elog-LOG-and-press-on,
> or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any
> hurry to believe that the latter is more appropriate without some
> analysis of what the callers are doing.
>
> The bottom line here is that I'm distrustful of behavioral changes
> introduced to simplify refactoring rather than to solve a live
> problem.

+1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the
checkpoint operation. Because the checkpoint is more important than
why a single snapshot file (out thousands or even million files) isn't
removed at that moment. Also, I originally proposed to change
elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink()
failures for the same reason.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-09 04:07:54 Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Previous Message Nathan Bossart 2022-08-09 03:51:29 Re: optimize lookups in snapshot [sub]xip arrays