Re: [HACKERS] pg_basebackup --progress output for batch execution

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Martín Marqués <martin(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] pg_basebackup --progress output for batch execution
Date: 2017-11-19 02:03:41
Message-ID: 8bdb8be9-d6c0-ee5f-3641-be165169cdee@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/10/2017 02:32 PM, Martín Marqués wrote:
> Hi,
>
> Thanks for having a look at this patch.
>
> 2017-11-09 20:55 GMT-03:00 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:
>> On Fri, Sep 29, 2017 at 4:00 PM, Martin Marques
>> <martin(dot)marques(at)2ndquadrant(dot)com> wrote:
>>>
>>> Hi,
>>>
>>> Some time ago I had to work on a system where I was cloning a standby
>>> using pg_basebackup, that didn't have screen or tmux. For that reason I
>>> redirected the output to a file and ran it with nohup.
>>>
>>> I normally (always actually ;) ) run pg_basebackup with --progress and
>>> --verbose so I can follow how much has been done. When done on a tty you
>>> get a nice progress bar with the percentage that has been cloned.
>>>
>>> The problem came with the execution and redirection of the output, as
>>> the --progress option will write a *very* long line!
>>>
>>> Back then I thought of writing a patch (actually someone suggested I do
>>> so) to add a --batch-mode option which would change the behavior
>>> pg_basebackup has when printing the output messages.
>>
>>
>>
>> While separate lines in the output file is better than one very long line,
>> it still doesn't seem so useful. If you aren't watching it in real time,
>> then you really need to have a timestamp on each line so that you can
>> interpret the output. The lines are about one second apart, but I don't
>> know robust that timing is under all conditions.
>
> I kind of disagree with your view here.
>
> If the cloning process takes many hours to complete (in my case, it
> was around 12 hours IIRC) you might want to peak at the log every now
> and then with tail.
>
> I do agree on adding a timestamp prefix to each line, as it's not
> clear from the code how often progress_report() is called.
>
>> I think I agree with Arthur that I'd rather have the decision made by
>> inspecting whether output is going to a tty, rather than by adding another
>> command line option. But maybe that is not detected robustly enough across
>> all platforms and circumstances?
>
> In this case, Arthur's idea is good, but would make the patch less
> generic/flexible for the end user.
>

I'm not quite sure about that. We're not getting the flexibility for
free, but for additional complexity (additional command-line option).
And what valuable capability would we (or the users) loose, really, by
relying on the isatty call?

So what use case is possible with --batch-mode but not with isatty (e.g.
when combined with tee)?

>
> That's why I tried to reproduce what top does when executed with -b
> (Batch mode operation). There, it's the end user who decides how the
> output is formatted (well, saying it decides on formatting a bit of
> an overstatement, but you get it ;) )
>

In 'top' a batch mode also disables input, and runs for a certain number
of interactions (as determined by '-n' option). That's not the case in
pg_basebackup, where it only adds the extra newline.

>
> An example where using isatty() might fail is if you run
> pg_basebackup from a tty but redirect the output to a file, I
> believe that in that case isatty() will return true, but it's very
> likely that the user might want batch mode output.
>

IMHO that should work correctly, as already explained by Arthur.

>
> But maybe we should also add Arthurs idea anyway (when not in batch
> mode), as it seems pretty lame to output progress in one line if you
> are not in a tty.
>

I'd say to just use isatty, unless we can come up with a compelling use
case that is not satisfied by it.

And if we end up using --batch-mode, perhaps we should only allow it
when --progress is used. It's the only thing it affects, and it makes no
difference when used without it.

Furthermore, I propose to reword this:

<para>
Runs <command>pg_basebackup</command> in batch mode. This is useful
if the output is to be pipped so the other end of the pipe reads each
line.
</para>
<para>
Using this option with <option>--progress</option> will result in
printing each progress output with a newline at the end, instead of a
carrige return.
</para>

like this:

<para>
Runs <command>pg_basebackup</command> in batch mode, in which
<option>--progress</option> terminates the lines with a regular
newline instead of carriage return.
</para>
<para>
This is useful if the output is redirected to a file or a pipe,
instead of a plain console.
</para>

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2017-11-19 02:10:10 Re: percentile value check can be slow
Previous Message Tomas Vondra 2017-11-19 01:30:57 Re: [HACKERS] new function for tsquery creartion