Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Date: 2020-02-06 02:35:41
Message-ID: CA+HiwqHdq8KB8MVt4x+WRqBfFfmJxAim+2fdwqJF_PjFOTZ-eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
>
> Sorry, I hadn't read this before. So, my proposal would make this a lie.
>
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress, at least as long as one only
> has this view to look at.
>
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

By the way, if calculating backup total size can take significantly
long in some cases, that is when requested by specifying --progress,
then it might be a good idea to define a separate phase for that, like
"estimating backup size" or some such. Currently, it's part of
"starting backup", which covers both running the checkpoint and size
estimation which run one after another.

I suspect people might never get stuck on "estimating backup size" as
they might on "running checkpoint", which perhaps only strengthens the
case for *always* calculating the size before sending the backup
header.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-02-06 02:38:20 Re: Add %x to PROMPT1 and PROMPT2
Previous Message Jeff Davis 2020-02-06 02:20:22 Re: Memory-Bounded Hash Aggregation