Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Date: 2022-03-28 20:30:27
Message-ID: 20220328203027.GK10577@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Nathan Bossart (nathandbossart(at)gmail(dot)com) wrote:
> On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote:
> > Looks like this change to an example in func.sgml is not quite right:
> >
> > -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
> > +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());
> >
> > pg_backup_stop returns a record now, not just lsn. So this works for me:
> >
> > +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
>
> Ah, good catch. I made this change in v7. I considered doing something
> like this
>
> SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w;
>
> but I think your suggestion is simpler.

I tend to agree. More below.

> Subject: [PATCH v7 1/1] remove exclusive backup mode
> diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
> index 0d69851bb1..c8b914c1aa 100644
> --- a/doc/src/sgml/backup.sgml
> +++ b/doc/src/sgml/backup.sgml
> @@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
> <listitem>
> <para>
> Connect to the server (it does not matter which database) as a user with
> - rights to run pg_start_backup (superuser, or a user who has been granted
> + rights to run pg_backup_start (superuser, or a user who has been granted
> EXECUTE on the function) and issue the command:
> <programlisting>
> -SELECT pg_start_backup('label', false, false);
> +SELECT pg_backup_start(label => 'label', fast => false);
> </programlisting>
> where <literal>label</literal> is any string you want to use to uniquely
> identify this backup operation. The connection
> - calling <function>pg_start_backup</function> must be maintained until the end of
> + calling <function>pg_backup_start</function> must be maintained until the end of
> the backup, or the backup will be automatically aborted.
> </para>
>
> <para>
> - By default, <function>pg_start_backup</function> can take a long time to finish.
> + By default, <function>pg_backup_start</function> can take a long time to finish.
> This is because it performs a checkpoint, and the I/O
> required for the checkpoint will be spread out over a significant
> period of time, by default half your inter-checkpoint interval

Hrmpf. Not this patch's fault, but the default isn't 'half your
inter-checkpoint interval' anymore (checkpoint_completion_target was
changed to 0.9 ... let's not discuss who did that and missed fixing
this). Overall though, maybe we should reword this to avoid having to
remember to change it again if we ever change the completion target
again? Also it seems to imply that pg_backup_start is going to run its
own independent checkpoint or something, which isn't the typical case.
I generally explain what is going on here like this:

By default, <function>pg_backup_start</function> will wait for the next
checkpoint to complete (see ref: checkpoint_timeout ... maybe also
wal-configuration.html).

> @@ -937,7 +925,7 @@ SELECT * FROM pg_stop_backup(false, true);
> ready to archive.
> </para>
> <para>
> - The <function>pg_stop_backup</function> will return one row with three
> + The <function>pg_backup_stop</function> will return one row with three
> values. The second of these fields should be written to a file named
> <filename>backup_label</filename> in the root directory of the backup. The
> third field should be written to a file named

I get that we have <function> and </function>, but those are just
formatting and don't show up in the actual text, and so it ends up
being:

The pg_backup_stop will return

Which doesn't sound quite right to me. I'd say we either drop 'The' or
add 'function' after. I realize that this patch is mostly doing a
s/pg_stop_backup/pg_backup_stop/, but, still.

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index 8a802fb225..73096708cc 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -25732,24 +25715,19 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
> <parameter>spcmapfile</parameter> <type>text</type> )
> </para>
> <para>
> - Finishes performing an exclusive or non-exclusive on-line backup.
> - The <parameter>exclusive</parameter> parameter must match the
> - previous <function>pg_start_backup</function> call.
> - In an exclusive backup, <function>pg_stop_backup</function> removes
> - the backup label file and, if it exists, the tablespace map file
> - created by <function>pg_start_backup</function>. In a non-exclusive
> - backup, the desired contents of these files are returned as part of
> + Finishes performing an on-line backup. The desired contents of the
> + backup label file and the tablespace map file are returned as part of
> the result of the function, and should be written to files in the
> backup area (not in the data directory).
> </para>

Given that this is a crucial part, which the exclusive mode has wrong,
I'd be a bit more forceful about it, eg:

The contents of the backup label file and the tablespace map file must
be stored as part of the backup and must NOT be stored in the live data
directory (doing so will cause PostgreSQL to fail to restart in the
event of a crash).

> @@ -25771,8 +25749,7 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
> The result of the function is a single record.
> The <parameter>lsn</parameter> column holds the backup's ending
> write-ahead log location (which again can be ignored). The second and
> - third columns are <literal>NULL</literal> when ending an exclusive
> - backup; after a non-exclusive backup they hold the desired contents of
> + third columns hold the desired contents of
> the label and tablespace map files.
> </para>
> <para>

I dislike saying 'desired' here as if it's something that we're nicely
asking but maybe isn't a big deal, and just saying 'label' isn't good
since we call it 'backup label' elsewhere and we should try hard to be
consistent and clear. How about:

The second column returns the contents of the backup label file, the
third column returns the contents of the tablespace map file. These
must be stored as part of the backup and are required as part of the
restore process.

> diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
> index 3c182c97d4..ee3fa148b6 100644
> --- a/src/bin/pg_ctl/pg_ctl.c
> +++ b/src/bin/pg_ctl/pg_ctl.c
> @@ -1069,7 +1069,7 @@ do_stop(void)
> get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
> {
> print_msg(_("WARNING: online backup mode is active\n"
> - "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
> + "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
> }
>
> print_msg(_("waiting for server to shut down..."));
> @@ -1145,7 +1145,7 @@ do_restart(void)
> get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
> {
> print_msg(_("WARNING: online backup mode is active\n"
> - "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
> + "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
> }
>
> print_msg(_("waiting for server to shut down..."));

This... can't actually happen anymore, right? Shouldn't we just pull
all of this out?

On a once-over of the rest of the code, I definitely like how much we're
able to simplify things in this area and remove various hacks in things
like pg_basebackup and pg_rewind where we previously had to worry about
backup_label and tablespace_map files being in a live data directory.
I'm planning to spend more time on this and hopefully be able to get it
in for v15.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Phil Krylov 2022-03-28 20:33:37 Re: jsonpath syntax extensions
Previous Message Andres Freund 2022-03-28 20:30:08 Re: Temporary tables versus wraparound... again