Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-02-02 18:37:38
Message-ID: 20220202183738.GA746893@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote:
> On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> However, I'm not sure about the change to ReadDirExtended(). That might be
>> okay for CheckPointSnapBuild(), which is just trying to remove old files,
>> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
>> are flushed to disk for the checkpoint. If we stop reading the directory
>> after an error and let the checkpoint continue, isn't it possible that some
>> mappings files won't be persisted to disk?
>
> Unless I mis-read your above statement, with LOG level in
> ReadDirExtended, I don't think we stop reading the files in
> CheckPointLogicalRewriteHeap. Am I missing something here?

ReadDirExtended() has the following comment:

* If elevel < ERROR, returns NULL after any error. With the normal coding
* pattern, this will result in falling out of the loop immediately as
* though the directory contained no (more) entries.

If there is a problem reading the directory, we will LOG and then exit the
loop. If we didn't scan through all the entries in the directory, there is
a chance that we didn't fsync() all the files that need it.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-02-02 18:40:09 Re: A qsort template
Previous Message Andres Freund 2022-02-02 18:31:07 ci/cfbot: run windows tests under a timeout