Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amitlangote09(at)gmail(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Date: 2020-02-05 08:30:38
Message-ID: 20200205.173038.1841242875909024142.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09(at)gmail(dot)com> wrote in
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > On 2020/02/04 10:34, Amit Langote wrote:
> > > I like:
> >
> > Thanks for reviewing the patch!
> >
> > > 1. initializing
> > > 2-5 waiting for backup start checkpoint to finish
> >
> > Can we shorten this to "waiting for checkpoint"? IMO the simpler
> > phase name is better and "to finish" sounds a bit redundant. Also
> > in the description of pg_stat_progress_create_index, basically
> > "waiting for xxx" is used.
>
> "waiting for checkpoint" works for me.

I'm not sure, but doesn't that mean "waiting for a checkpoint to
start"? Sorry in advance if that is not the case.

> > > 3-3 streaming database files
> > > 4-5 waiting for wal archiving to finish
> >
> > Can we shorten this to "waiting for wal archiving" because of
> > the same reason as above?
>
> Yes.
>
> > > 5-1 transferring wal (or streaming wal)
> >
> > IMO "transferring wal" is better because this phase happens only when
> > "--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
> > seems to implie "--wal-method=stream", instead.
>
> Ah, okay,
>
> > > Reading this:
> > >
> > > + <entry><structfield>backup_total</structfield></entry>
> > > + <entry><type>bigint</type></entry>
> > > + <entry>
> > > + Total amount of data that will be streamed. If progress reporting
> > > + is not enabled in <application>pg_basebackup</application>
> > > + (i.e., <literal>--progress</literal> option is not specified),
> > > + this is <literal>0</literal>.
> > >
> > > I wonder how hard would it be to change basebackup.c to always set
> > > backup_total, irrespective of whether --progress is specified with
> > > pg_basebackup or not? It seems quite misleading to leave it set to 0,
> > > because one may panic that they have lost their data, that is, if they
> > > haven't first read the documentation.
> >
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
>
> Sorry, I hadn't read this before. So, my proposal would make this a lie.
>
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress, at least as long as one only
> has this view to look at.
>
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-02-05 08:32:55 Re: ALTER tbl rewrite loses CLUSTER ON index
Previous Message Kyotaro Horiguchi 2020-02-05 08:25:18 Re: A bug in LWLOCK_STATS