Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-08-10 07:14:54
Message-ID: CALj2ACVd0ukptEi5pFyeEx-nw_C9jTRvtQJvQpm0BUhOCvrH0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2022 at 10:57 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> OK, so there aren't many places in 0001 that error out where we
> previously did not.

Well, that's not true I believe. The 0001 patch introduces
get_dirent_type() with elevel ERROR which means that lstat failures
are now reported as ERRORs with ereport(ERROR, as opposed to
continuing on the HEAD.

- if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
+ if (get_dirent_type(path, mapping_de, false, ERROR) != PGFILETYPE_REG)
continue;

- if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
+ if (get_dirent_type(path, snap_de, false, ERROR) != PGFILETYPE_REG)
{

so on.

The main idea of using get_dirent_type() instead of lstat or stat is
to benefit from avoiding system calls on platforms where the directory
entry type is stored in dirent structure, but not to cause errors for
lstat or stat system calls failures where we previously didn't. I
think 0001 must be just replacing lstat or stat with get_dirent_type()
with elevel ERROR if lstat or stat failure is treated as ERROR
previously, otherwise with elevel LOG. I modified it as attached. Once
this patch gets committed, then we can continue the discussion as to
whether we make elog-LOG to elog-ERROR or vice-versa.

I'm tempted to use get_dirent_type() in RemoveXlogFile() but that
requires us to pass struct dirent * from RemoveOldXlogFiles() (as
attached 0002 for now), If done, this avoids one lstat() system call
per WAL file. IMO, that's okay to pass an extra function parameter to
RemoveXlogFile() as it is a static function and there can be many
(thousands of) WAL files at times, so saving system calls (lstat() *
number of WAL files) is definitely an advantage.

Thoughts?

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment Content-Type Size
v14-0001-Make-more-use-of-get_dirent_type.patch application/octet-stream 13.1 KB
v14-0002-Replace-lstat-with-get_dirent_type-in-xlog.c.patch application/octet-stream 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-08-10 07:33:45 Re: SUBTRANS: Minimizing calls to SubTransSetParent()
Previous Message John Naylor 2022-08-10 06:50:14 use SSE2 for is_valid_ascii