Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-08-10 09:58:25
Message-ID: CALj2ACWSjE8AiJ4xaBS4ULQ8T6RevwsNhTD6yPBLugJBTD5LCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2022 at 2:11 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > The main idea of using get_dirent_type() instead of lstat or stat is
> > to benefit from avoiding system calls on platforms where the directory
> > entry type is stored in dirent structure, but not to cause errors for
> > lstat or stat system calls failures where we previously didn't.
>
> Yeah, I get it... I'm OK with separating mechanical changes like
> lstat() -> get_dirent_type() from policy changes on error handling. I
> initially thought the ERROR was surely better than what we're doing
> now (making extra system calls to avoid unlinking unexpected things,
> for which our only strategy on failure is to try to unlink the thing
> anyway), but it's better to discuss that separately.
>
> > I
> > think 0001 must be just replacing lstat or stat with get_dirent_type()
> > with elevel ERROR if lstat or stat failure is treated as ERROR
> > previously, otherwise with elevel LOG. I modified it as attached. Once
> > this patch gets committed, then we can continue the discussion as to
> > whether we make elog-LOG to elog-ERROR or vice-versa.
>
> Agreed, will look at this version tomorrow.

Thanks.

> > I'm tempted to use get_dirent_type() in RemoveXlogFile() but that
> > requires us to pass struct dirent * from RemoveOldXlogFiles() (as
> > attached 0002 for now), If done, this avoids one lstat() system call
> > per WAL file. IMO, that's okay to pass an extra function parameter to
> > RemoveXlogFile() as it is a static function and there can be many
> > (thousands of) WAL files at times, so saving system calls (lstat() *
> > number of WAL files) is definitely an advantage.
>
> - lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
> + get_dirent_type(path, xlde, false, LOG) != PGFILETYPE_REG &&
>
> I think you wanted ==, but maybe it'd be nicer to pass in a
> "recyclable" boolean to RemoveXlogFile()?

Ah, my bad, I corrected it now.

If you mean to do "bool recyclable = (get_dirent_type(path, xlde,
false, LOG) == PGFILETYPE_REG)"" and pass it as parameter to
RemoveXlogFile(), then we have to do get_dirent_type calls for every
WAL file even when any of (wal_recycle && *endlogSegNo <= recycleSegNo
&& XLogCtl->InstallXLogFileSegmentActive) is false which is
unnecessary and it's better to pass dirent structure as a parameter.

> If you're looking for more, rmtree() looks like another.

Are you suggesting to not use pgfnames() in rmtree() and do
opendir()-readdir()-get_dirent_type() directly? If yes, I don't think
it's a great idea because rmtree() has recursive calls and holding
opendir()-readdir() for all rmtree() recursive calls without
closedir() may eat up dynamic memory if there are deeper level
directories. I would like to leave it that way. Do you have any other
ideas in mind?

Please find the v15 patch, I merged the RemoveXlogFile() changes into 0001.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment Content-Type Size
v15-0001-Make-more-use-of-get_dirent_type-in-place-of-sta.patch application/octet-stream 15.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-08-10 10:57:46 Re: NAMEDATALEN increase because of non-latin languages
Previous Message Peter Smith 2022-08-10 09:40:26 Re: Perform streaming logical transactions by background workers and parallel apply