Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-01-31 05:12:54
Message-ID: CALj2ACWgKS=ZUJ0xcXhbqieJXMQxncLMQRx6nwtVc9Fp5H58RA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 27, 2022 at 6:31 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> I spent some time thinking about the right way to proceed here, and I came
> up with the attached patches. The first patch just adds error checking for
> various lstat() calls in the replication code. If lstat() fails, then it
> probably doesn't make sense to try to continue processing the file.
>
> The second patch changes some nearby calls to ereport() to ERROR. If these
> failures are truly unexpected, and we don't intend to support use-cases
> like concurrent manual deletion, then failing might be the right way to go.
> I think it's a shame that such failures could cause checkpointing to
> continually fail, but that topic is already being discussed elsewhere [0].
>
> [0] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com

After an off-list discussion with Andreas, proposing here a patch that
basically replaces ReadDir call with ReadDirExtended and gets rid of
lstat entirely. With this chance, the checkpoint will only care about
the snapshot and mapping files and not fail if it finds other files in
the directories. Removing lstat enables us to make things faster as we
avoid a bunch of extra system calls - one lstat call per each mapping
or snapshot file.

This patch also includes, converting "could not parse filename" and
"could not remove file" errors to LOG messages in
CheckPointLogicalRewriteHeap. This will enable checkpoint not to waste
the amount of work that it has done in case it is unable to
parse/remove the file, for whatever reasons.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v5-0001-Replace-ReadDir-with-ReadDirExtended.patch application/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-01-31 05:17:41 Re: GUC flags
Previous Message David Rowley 2022-01-31 04:28:43 Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?