Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Date: 2020-02-05 06:36:15
Message-ID: 06600d54-d174-ff31-2989-561d86bbf4fb@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/02/04 10:34, Amit Langote wrote:
> On Mon, Feb 3, 2020 at 11:04 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>> So we now have the following ideas about the phase names for pg_basebackup.
>>
>> 1.
>> initializing
>>
>> 2.
>> 2-1. starting backup
>> 2-2. starting file transfer
>> 2-3. performing pg_start_backup
>> 2-4. performing checkpoint
>> 2-5. waiting for [ backup start ] checkpoint to finish
>>
>> 3.
>> 3-1. streaming backup
>> 3-2. transferring database files
>> 3-3. streaming database files
>> 3-4. transferring files
>>
>> 4.
>> 4-1. stopping backup
>> 4-2. finishing file transfer
>> 4-3. performing pg_stop_backup
>> 4-4. finishing backup
>> 4-5. waiting for WAL archiving to finish
>> 4-6. performing WAL archive
>>
>> 5.
>> 1. transferring wal
>> 2. transferring WAL files
>>
>> What conbination of these do you prefer?
>
> I like:

Thanks for reviewing the patch!

> 1. initializing
> 2-5 waiting for backup start checkpoint to finish

Can we shorten this to "waiting for checkpoint"? IMO the simpler
phase name is better and "to finish" sounds a bit redundant. Also
in the description of pg_stat_progress_create_index, basically
"waiting for xxx" is used.

> 3-3 streaming database files
> 4-5 waiting for wal archiving to finish

Can we shorten this to "waiting for wal archiving" because of
the same reason as above?

> 5-1 transferring wal (or streaming wal)

IMO "transferring wal" is better because this phase happens only when
"--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
seems to implie "--wal-method=stream", instead.

>>> - <varlistentry>
>>> + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
>>>
>>> I don't see any new text in the documentation patch that uses above
>>> xref, so no need to define it?
>>
>> The following description that I added uses this.
>>
>> certain commands during command execution. Currently, the only commands
>> which support progress reporting are <command>ANALYZE</command>,
>> <command>CLUSTER</command>,
>> - <command>CREATE INDEX</command>, and <command>VACUUM</command>.
>> + <command>CREATE INDEX</command>, <command>VACUUM</command>,
>> + and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
>> + command that <xref linkend="app-pgbasebackup"/> issues to take
>> + a base backup).
>
> Sorry, I missed that. I was mistakenly expecting a different value of linkend.
>
> Some comments on v3:
>
> + <entry>Process ID of a WAL sender process.</entry>
>
> "a" sounds redundant. Maybe:
>
> of this WAL sender process or
> of WAL sender process

I borrowed "Process ID of a WAL sender process" from the description
of pg_stat_replication.pid. But if it's better to get rid of "a",
I'm happy to do that!

> Reading this:
>
> + <entry><structfield>backup_total</structfield></entry>
> + <entry><type>bigint</type></entry>
> + <entry>
> + Total amount of data that will be streamed. If progress reporting
> + is not enabled in <application>pg_basebackup</application>
> + (i.e., <literal>--progress</literal> option is not specified),
> + this is <literal>0</literal>.
>
> I wonder how hard would it be to change basebackup.c to always set
> backup_total, irrespective of whether --progress is specified with
> pg_basebackup or not? It seems quite misleading to leave it set to 0,
> because one may panic that they have lost their data, that is, if they
> haven't first read the documentation.

Yeah, I understand your concern. The pg_basebackup document explains
the risk when --progress is specified, as follows. Since I imagined that
someone may explicitly disable --progress to avoid this risk, I made
the server estimate the total size only when --progress is specified.
But you think that this overhead by --progress is negligibly small?

--------------------
When this is enabled, the backup will start by enumerating the size of
the entire database, and then go back and send the actual contents.
This may make the backup take slightly longer, and in particular it will
take longer before the first data is sent.
--------------------

If we really can always estimate the total size, whether --progress is
specified or not, we should get rid of PROGRESS option from BASE_BACKUP
replication command because it will no longer be necessary, I think.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-02-05 06:36:45 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message Surafel Temesgen 2020-02-05 06:06:32 Re: [PATCH v1] Allow COPY "text" format to output a header