Re: Allow pg_archivecleanup to remove backup history files

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow pg_archivecleanup to remove backup history files
Date: 2023-05-15 00:18:32
Message-ID: ZGF6WNGoVcUwi2EF@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2023 at 05:53:45PM +0900, torikoshia wrote:
> On 2023-05-10 17:52, Bharath Rupireddy wrote:
> I was a little concerned about what to do when deleting both the files
> ending in .gz and backup history files.
> Is making it possible to specify both "-x .backup" and "-x .gz" the way to
> go?
>
> I also concerned someone might add ".backup" to WAL files, but does that
> usually not happen?

Depends on the archive command, of course. I have seen code using
this suffix for some segment names in the past, FWIW.

>> Comments on the patch:
>> 1. Why just only the backup history files? Why not remove the timeline
>> history files too? Is it because there may not be as many tli switches
>> happening as backups?
>
> Yeah, do you think we should also add logic for '-x .history'?

Timeline history files can be critical pieces when it comes to
assigning a TLI as these could be retrieved by a restore_command
during recovery for a TLI jump or just assign a new TLI number at the
end of recovery, still you could presumably remove the TLI history
files that are older than the TLI the segment defined refers too.
(pg_archivecleanup has no specific logic to look at the history with
child TLIs for example, to keep it simple, and I'd rather keep it this
way). There may be an argument for making that optional, of course,
but it does not strike me as really necessary compared to the backup
history files which are just around for debugging purposes.

>> 2.+sub remove_backuphistoryfile_run_check
>> +{
>> Why to invent a new function when run_check() can be made generic with
>> few arguments passed?
>
> Thanks, I'm going to reconsider it.

+ <para>
+ Remove files including backup history files, whose suffix is <filename>.backup</filename>.
+ Note that when <replaceable>oldestkeptwalfile</replaceable> is a backup history file,
+ specified file is kept and only preceding WAL files and backup history files are removed.
+ </para>

This addition to the documentation looks unprecise to me. Backup
history files have a more complex format than just the .backup
suffix, and this is documented in backup.sgml.

How about plugging in some long options, and use something more
explicit like --clean-backup-history?

- if ((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) &&
+ if (((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) ||
+ (removeBackupHistoryFile && IsBackupHistoryFileName(walfile))) &&
strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0)

Could it be a bit cleaner to split this check in two, saving one level
of indentation on the way for its most inner loop? I would imagine
something like:
/* Check file name */
if (!IsXLogFileName(walfile) &&
!IsPartialXLogFileName(walfile))
continue;
/* Check cutoff point */
if (strcmp(walfile + 8, exclusiveCleanupFileName + 8) >= 0)
continue;
//rest of the code doing the unlinks.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-05-15 01:06:29 Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Previous Message Steve Chavez 2023-05-15 00:00:40 Re: Using make_ctags leaves tags files in git