Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-07-08 17:14:39
Message-ID: 20220708171439.GB2356733@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 08, 2022 at 09:39:10PM +0530, Bharath Rupireddy wrote:
> 0001 - there are many places where lstat/stat is being used - don't we
> need to replace all or most of them with get_dirent_type?

It's been a while since I wrote this one, but I believe my intent was to
replace as many [l]stat() calls in ReadDir()-style loops as possible with
get_dirent_type(). Are there any that I've missed?

> 0002 - I'm not quite happy with this patch, with the change,
> checkpoint errors out, if it can't remove just a file - the comments
> there says it all. Is there any strong reason for this change?

Andres noted several concerns upthread. In short, ignoring unexpected
errors makes them harder to debug and likely masks bugs.

FWIW I agree that it is unfortunate that a relatively non-critical error
here leads to checkpoint failures, which can cause much worse problems down
the road. I think this is one of the reasons for moving tasks like this
out of the checkpointer, as I'm trying to do with the proposed custodian
process [0].

[0] https://commitfest.postgresql.org/38/3448/

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-08 17:18:48 Re: remove more archiving overhead
Previous Message Robert Haas 2022-07-08 17:12:18 Re: pg15b2: large objects lost on upgrade