Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-02-15 17:57:53
Message-ID: 20220215175752.GA2413813@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2022 at 09:09:52AM -0800, Andres Freund wrote:
> On 2022-02-10 21:30:45 +0530, Bharath Rupireddy wrote:
>> Replace ReadDir with ReadDirExtended (in CheckPointSnapBuild) and
>> get rid of lstat entirely.
>
> I think this might be based on a slight misunderstanding / bad phrasing on my
> part. We can use get_dirent_type() to optimize away the lstat on most
> platforms, ReadDirExtended itself doesn't do that automatically. I was trying
> to reference removing lstat calls by using get_dirent_type() in more places...
>
>
>> We still use ReadDir in CheckPointLogicalRewriteHeap
>> because unable to read directory would result a NULL from
>> ReadDirExtended and we may miss to fsync the remaining map files,
>> so here let's error out with ReadDir.
>
> Then why is this skipping the lstat?
>
>
>> Also, convert "could not parse filename" and "could not remove file"
>> errors to LOG messages in CheckPointLogicalRewriteHeap. This will
>> enable checkpoint not to waste the amount of work that it had done.
>
> I still doubt this is a good idea.

IIUC you are advocating for something more like the attached patches.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0001-make-use-of-get_dirent_type-in-replication-code.patch text/x-diff 3.5 KB
v9-0002-add-error-checking-for-call-to-lstat-in-replicati.patch text/x-diff 1.1 KB
v9-0003-minor-improvements-to-replication-code.patch text/x-diff 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-02-15 17:59:44 Re: refactoring basebackup.c (zstd)
Previous Message Nitin Jadhav 2022-02-15 17:53:57 Re: Refactor CheckpointWriteDelay()