Re: thinko in basic_archive.c

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: robertmhaas(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: thinko in basic_archive.c
Date: 2022-10-19 04:51:12
Message-ID: CALj2ACXJjR+4qmEunj4RtBE4rGKkW8S9BLp7qfx+62sbBYkW-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 19, 2022 at 8:58 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> Unbounded number of sequential crash-restarts itself is a more serious
> problem..

Agree. The basic_archive module currently leaves temp files around
even for normal restarts of the cluster, which is bad IMO.

> An archive module could clean up them at startup or at the first call
> but that might be dangerous (since archive directory is I think
> thought as external resource).

The archive module must be responsible for cleaning up the temp file
that it creates. One way to do it is in the archive module's shutdown
callback, this covers most of the cases, but not all.

> Honestly, I'm not sure about a reasonable scenario where simultaneous
> archivings of a same file is acceptable, though. I feel that we should
> not allow simultaneous archiving of the same segment by some kind of
> interlocking. In other words, we might should serialize duplicate
> archiving of asame file.

In a typical production environment, there's some kind of locking (for
instance lease files) that allow/disallow file
creation/deletion/writes/reads which guarantees that the same file
isn't put into a directory (can be archive location) many times. And
as you rightly said archive_directory is something external to
postgres and we really can't deal with concurrent writers
writing/creating the same files. Even if we somehow try to do it, it
makes things complicated. This is true for any PGDATA directories.
However, the archive module implementers can choose to define such a
locking strategy.

> In another direction, the current code allows duplicate simultaneous
> copying to temporary files with different names then the latest
> renaming wins. We reach the almost same result (on Linuxen (or
> POSIX?)) by unlinking the existing tempfile first then create a new
> one with the same name then continue. Even if the tempfile were left
> alone after a crash, that file would be unlinked at the next trial of
> archiving. But I'm not sure how this can be done on Windows.. In the
> first place I'm not sure that the latest-unconditionally-wins policy
> is appropriate or not, though.

We really can't just unlink the temp file because it has pid and
timestamp in the filename and it's hard to determine the temp file
that we created earlier.

As far as the basic_archive module is concerned, we ought to keep it
simple. I still think the simplest we can do is to use the
basic_archive's shutdown_cb to delete (in most of the cases, but not
all) the left-over temp file that the module is dealing with
as-of-the-moment and add a note about the users dealing with
concurrent writers to the basic_archive.archive_directory like the
attached v2 patch.

> > A simple server restart while the basic_archive module is copying
> > to/from temp file would leave the file behind, see[1]. I think we can
> > fix this by defining shutdown_cb for the basic_archive module, like
> > the attached patch. While this helps in most of the crashes, but not
> > all. However, this is better than what we have right now.
>
> ShutdownWalRecovery() does the similar thing, but as you say this one
> covers rather narrow cases than that since RestoreArchiveFile()
> finally overwrites the left-alone files at the next call for that
> file.

We're using unique temp file names in the basic_archive module so we
can't overwrite the same upon restart.

> # The patch seems forgetting to clear the tmepfilepath *after* a
> # successful renaming.

It does so at the beginning of basic_archive_file() which is sufficient.

> And I don't see how the callback is called.

call_archive_module_shutdown_callback()->basic_archive_shutdown().

Please see the attached v2 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Remove-leftover-temporary-files-via-basic_archive.patch application/octet-stream 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-10-19 05:22:04 Re: fix archive module shutdown callback
Previous Message Peter Smith 2022-10-19 04:40:31 Re: create subscription - improved warning message