Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
Date: 2022-09-21 11:40:39
Message-ID: CALj2ACWNHCEefPXqFw7LfZa_b3UAi4uvvv6tf7qQSjRF2tL61A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 21, 2022 at 12:27 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> >>> I've also taken help of the error callback mechanism to clean up the
> >>> allocated memory in case of a failure. For do_pg_abort_backup() cases,
> >>> I think we don't need to clean the memory because that gets called on
> >>> proc exit (before_shmem_exit()).
> >>
> >> Memory could still bloat while the process running the SQL functions
> >> is running depending on the error code path, anyway.
> >
> > I didn't get your point. Can you please elaborate it? I think adding
> > error callbacks at the right places would free up the memory for us.
> > Please note that we already are causing memory leaks on HEAD today.
>
> I mean that HEAD makes no effort in freeing this memory in
> TopMemoryContext on session ERROR.

Correct. We can also solve it as part of this commit. Please let me
know your thoughts on 0002 patch.

> I have a few comments on 0001.
>
> +#include <access/xlogbackup.h>
> Double quotes wanted here.

Ah, my bad. Corrected now.

> deallocate_backup_variables() is the only part of xlogbackup.c that
> includes references of the tablespace map_and backup_label
> StringInfos. I would be tempted to fully decouple that from
> xlogbackup.c/h for the time being.

There's no problem with it IMO, after all, they are backup related
variables. And that function reduces a bit of duplicate code.

> - tblspc_map_file = makeStringInfo();
> Not sure that there is a need for a rename here.

We're referring tablespace_map and backup_label internally all around,
just to be in sync, I wanted to rename it while we're refactoring this
code.

> +void
> +build_backup_content(BackupState *state, bool ishistoryfile, StringInfo str)
> +{
> It would be more natural to have build_backup_content() do by itself
> the initialization of the StringInfo for the contents of backup_label
> and return it as a result of the routine? This is short-lived in
> xlogfuncs.c when the backup ends.

See the below explaination.

> @@ -248,18 +250,25 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
> [...]
> + /* Construct backup_label contents. */
> + build_backup_content(backup_state, false, backup_label);
>
> Actually, for base backups, perhaps it would be more intuitive to
> build and free the StringInfo of the backup_label when we send it for
> base.tar rather than initializing it at the beginning and freeing it
> at the end?

sendFileWithContent() is in a for-loop and we are good if we call
build_backup_content() before do_pg_backup_start() just once. Also,
allocating backup_label in the for loop makes error handling trickier,
how do we free-up when sendFileWithContent() errors out? Of course, we
can allocate backup_label once even in the for loop with bool
first_time sort of variable and store StringInfo *ptr_backup_label; in
error callback info, but that would make things unnecessarily complex,
instead we're good allocating and creating backup_label content at the
beginning and freeing-it up at the end.

> - * pg_backup_start: set up for taking an on-line backup dump
> + * pg_backup_start: start taking an on-line backup.
> *
> - * Essentially what this does is to create a backup label file in $PGDATA,
> - * where it will be archived as part of the backup dump. The label file
> - * contains the user-supplied label string (typically this would be used
> - * to tell where the backup dump will be stored) and the starting time and
> - * starting WAL location for the dump.
> + * This function starts the backup and creates tablespace_map contents.
>
> The last part of the comment is still correct while the former is not,
> so this loses some information.

Added that part before pg_backup_stop() now where it makes sense with
the refactoring.

I'm attaching the v10 patch-set with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v10-0001-Refactor-backup-related-code.patch application/x-patch 33.5 KB
v10-0002-Add-error-callbacks-for-deallocating-backup-rela.patch application/x-patch 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-09-21 11:43:15 Re: Hash index build performance tweak from sorting
Previous Message Amit Kapila 2022-09-21 11:40:00 Re: Perform streaming logical transactions by background workers and parallel apply