Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-07-18 11:23:18
Message-ID: CALj2ACUYMuO027XNJNsBfaPUhA+6fC=S0HjasQ4BpUQt0W9eSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2022 at 10:44 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> > 0002 - I'm not quite happy with this patch, with the change,
> > checkpoint errors out, if it can't remove just a file - the comments
> > there says it all. Is there any strong reason for this change?
>
> Andres noted several concerns upthread. In short, ignoring unexpected
> errors makes them harder to debug and likely masks bugs.
>
> FWIW I agree that it is unfortunate that a relatively non-critical error
> here leads to checkpoint failures, which can cause much worse problems down
> the road. I think this is one of the reasons for moving tasks like this
> out of the checkpointer, as I'm trying to do with the proposed custodian
> process [0].
>
> [0] https://commitfest.postgresql.org/38/3448/

IMHO, we can keep it as-is and if required can be changed as part of
the patch set [0], as this change without [0] can cause a checkpoint
to fail. Furthermore, I would like it if we convert "could not parse
filename" and "could not remove file" ERRORs of
CheckPointLogicalRewriteHeap to LOGs until [0] gets in - others may
have different opinions though.

Just wondering - do we ever have a problem if we can't remove the
snapshot or mapping file?

May be unrelated, RemoveTempXlogFiles doesn't even bother to check if
the temp wal file is removed.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gaddam Sai Ram 2022-07-18 11:24:47 Re: make install-world fails sometimes in Mac M1
Previous Message Dilip Kumar 2022-07-18 11:21:00 Re: making relfilenodes 56 bits