Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
Date: 2022-09-20 11:43:49
Message-ID: CALj2ACWo57XEkJBnW+Ow=VBRPDuzt1ws8jbtZhswDbmrkQvOqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2022 at 7:20 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> The main regression test suite should not include direct calls to
> pg_backup_start() or pg_backup_stop() as these depend on wal_level,
> and we spend a certain amount of resources in keeping the tests a
> maximum portable across such configurations, wal_level = minimal being
> one of them. One portable example is in 001_stream_rep.pl.

Understood.

> > That's a good idea. I'm marking a flag if the label name overflows (>
> > MAXPGPATH), later using it in do_pg_backup_start() to error out. We
> > could've thrown error directly, but that changes the behaviour - right
> > now, first "
> > wal_level must be set to \"replica\" or \"logical\" at server start."
> > gets emitted and then label name overflow error - I don't want to do
> > that.
>
> - if (strlen(backupidstr) > MAXPGPATH)
> + if (state->name_overflowed == true)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("backup label too long (max %d bytes)",
> It does not strike me as a huge issue to force a truncation of such
> backup label names. 1024 is large enough for basically all users,
> in my opinion. Or you could just truncate in the internal logic, but
> still complain at MAXPGPATH - 1 as the last byte would be for the zero
> termination. In short, there is no need to complicate things with
> name_overflowed.

We currently allow MAXPGPATH bytes of label name excluding null
termination. I don't want to change this behaviour. In the attached v9
patch, I'm truncating the larger names to MAXPGPATH + 1 bytes in
backup state (one extra byte for representing that the name has
overflown, and another extra byte for null termination).

> + StringInfo backup_label; /* backup_label file contents */
> + StringInfo tablespace_map; /* tablespace_map file contents */
> + StringInfo history_file; /* history file contents */
> IMV, repeating a point I already made once upthread, BackupState
> should hold none of these. Let's just generate the contents of these
> files in the contexts where they are needed, making BackupState
> something to rely on to build them in the code paths where they are
> necessary. This is going to make the reasoning around the memory
> contexts where each one of them is stored much easier and reduce the
> changes of bugs in the long-term.

I've separated out these variables from the backup state, please see
the attached v9 patch.

> > I've also taken help of the error callback mechanism to clean up the
> > allocated memory in case of a failure. For do_pg_abort_backup() cases,
> > I think we don't need to clean the memory because that gets called on
> > proc exit (before_shmem_exit()).
>
> Memory could still bloat while the process running the SQL functions
> is running depending on the error code path, anyway.

I didn't get your point. Can you please elaborate it? I think adding
error callbacks at the right places would free up the memory for us.
Please note that we already are causing memory leaks on HEAD today.

I addressed the above review comments. I also changed a wrong comment
[1] that lies before pg_backup_start() even after the removal of
exclusive backup.

I'm attaching v9 patch set herewith, 0001 - refactors the backup code
with backup state, 0002 - adds error callbacks to clean up the memory
allocated for backup variables. Please review them further.

[1]
* Essentially what this does is to create a backup label file in $PGDATA,
* where it will be archived as part of the backup dump. The label file
* contains the user-supplied label string (typically this would be used
* to tell where the backup dump will be stored) and the starting time and
* starting WAL location for the dump.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0001-Refactor-backup-related-code.patch application/x-patch 34.7 KB
v9-0002-Add-error-callbacks-for-deallocating-backup-relat.patch application/x-patch 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-09-20 12:00:58 Re: why can't a table be part of the same publication as its schema
Previous Message Bharath Rupireddy 2022-09-20 11:43:17 Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions