Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-02-01 23:55:38
Message-ID: 20220201235538.GA740616@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote:
> After an off-list discussion with Andreas, proposing here a patch that
> basically replaces ReadDir call with ReadDirExtended and gets rid of
> lstat entirely. With this chance, the checkpoint will only care about
> the snapshot and mapping files and not fail if it finds other files in
> the directories. Removing lstat enables us to make things faster as we
> avoid a bunch of extra system calls - one lstat call per each mapping
> or snapshot file.

I think removing the lstat() is probably reasonable. We currently aren't
doing proper error checking, and the chances of a non-regular file matching
the prefix are likely pretty low. In the worst case, we'll LOG or ERROR
when unlinking or fsyncing fails.

However, I'm not sure about the change to ReadDirExtended(). That might be
okay for CheckPointSnapBuild(), which is just trying to remove old files,
but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
are flushed to disk for the checkpoint. If we stop reading the directory
after an error and let the checkpoint continue, isn't it possible that some
mappings files won't be persisted to disk?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Todd Hubers 2022-02-01 23:56:02 Re: Feature Proposal: Connection Pool Optimization - Change the Connection User
Previous Message Andres Freund 2022-02-01 23:43:31 Re: Replace uses of deprecated Python module distutils.sysconfig