Re: thinko in basic_archive.c

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: robertmhaas(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: thinko in basic_archive.c
Date: 2022-10-20 07:59:12
Message-ID: CALj2ACUi_bzt=U7Nx5A2H0i-iK3G3EeJoe7MMKiNERmj6O3A=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 20, 2022 at 6:57 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > The archive module must be responsible for cleaning up the temp file
> > that it creates. One way to do it is in the archive module's shutdown
> > callback, this covers most of the cases, but not all.
>
> True. But I agree to Robert that such temporary files should be
> cleanup-able without needing temporarily-valid knowledge (current file
> name, in this case). A common strategy for this is to name those files
> by names that can be identifed as garbage.

I'm not sure how we can distinguish temp files as garbage based on
name. As Robert pointed out upthread, using system identifier may not
help as the standbys share the same system identifier and it's
possible that they might archive to the same directory. Is there any
other way?

> But since power cut is a typical crash source, we need to identify
> ruined temporary files and the current naming convention is incomplete
> in this regard.

Please note that basic_archive module creates one temp file at a time
to make file copying/moving atomic and it can keep track of the temp
file name and delete it using shutdown callback which helps in most of
the scenarios. As said upthread, repeated crashes while basic_archive
module is atomically copying files around is a big problem in itself
and basic_archive module need not worry about it much.

> flock() on nfs..
>
> The worst case I can come up with regardless of feasibility is a
> multi-standby physical replication set where all hosts share one
> archive directory. Indeed live and dead temprary files can coexist
> there. However, I think we can identify truly rotten temp files by
> inserting host name or cluster name (means cluster_name in
> postgresql.conf) even in that case. This premise that DBA names every
> cluster differently, but I think DBAs that is going to configure such
> a system are required to be very cautious about that kind of aspect.

Well, these ideas are great! However, we can leave defining such
strategies to archive module implementors. IMO, the basich_archive
module ought to be as simple and elegant as possible yet showing up
the usability of archive modules feature.

> Of course, it premised that a cluster uses the same name for a
> segment. If we insert cluseter_name into the temprary name, a starting
> cluster can indentify garbage files to clean up. For example if we
> name them as follows.
>
> ARCHTEMP_cluster1_pid_time_<lsn>
>
> A starting cluster can clean up all files starts with
> "archtemp_cluster1_*". (We need to choose the delimiter carefully,
> though..)

Postgres cleaning up basic_archive modules temp files at the start up
isn't a great idea IMO. Because these files are not related to server
functionality in any way unlike temp files removed in
RemovePgTempFiles(). IMO, we ought to keep the basic_archive module
simple.
.
> No. I didn't mean that, If server stops after a successfull
> durable_rename but before the next call to
> basic_archive_file_internal, that call back makes false comlaint since
> that temprary file is actually gone.

Right. Fixed it.

> > Please see the attached v2 patch.
>
> +static char tempfilepath[MAXPGPATH + 256];
>
> MAXPGPATH is the maximum length of a file name that PG assumes to be
> able to handle. Thus extending that length seems wrong.

I think it was to accommodate the temp file name - "archtemp", file,
MyProcPid, epoch, but I agree that it can just be MAXPGPATH. However,
most of the places the core defines the path name to be MAXPGPATH +
some bytes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v3-0001-Remove-leftover-temporary-files-via-basic_archive.patch application/x-patch 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-10-20 08:29:57 Re: Standby recovers records from wrong timeline
Previous Message Bharath Rupireddy 2022-10-20 07:58:45 Re: pg_recvlogical prints bogus error when interrupted