Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-17 23:22:20
Message-ID: c869c5ba-0aa2-06df-4e1d-60169cf7e230@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/17/2017 08:17 PM, Jim Nasby wrote:
> On 2/14/17 5:18 PM, Robert Haas wrote:
>> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>> I'd rather have a --quiet mode instead. If you're running it by hand,
>>> you're likely to omit the switch, whereas when writing the cron job
>>> you're going to notice lack of switch even before you let the job run
>>> once.
>>
>> Well, that might've been a better way to design it, but changing it
>> now would break backward compatibility and I'm not really sure that's
>
> Meh... it's really only going to affect cronjobs or scripts, which are
> easy enough to fix, and you're not going to have that many of them (or
> if you do you certainly have an automated way to push the update).
>

I think you're underestimating the breakage and overestimating how easy
it's going to be to it. It's true we'd only change this in a major
version, so people should assume possible breakage and test.

>> a good idea. Even if it is, it's a separate concern from whether or
>> not in the less-quiet mode we should point out that we're waiting for
>> a checkpoint on the server side.
>
> Well, --quite was suggested because of confusion from pg_basebackup
> twiddling it's thumbs...

I'm in favor of the '--verbose' route. People are used to that when
investigating issues, and it does not break existing cron jobs. I can
live with --quiet though, as long as we don't resort to some craziness
along the lines "if there's tty be verbose, otherwise be quiet".

I have my doubts about this actually addressing gitlab-like mistakes,
though, because it's a helluva jump from "It's waiting and not doing
anything," to "We need to remove the datadir." (One of the reasons being
that non-empty directory is a local issue, and there's no reason why the
tool should wait instead of just reporting an error.)

FWIW before messing with the pg_basebackup code, perhaps we should
improve the documentation and explain clearly the meaning of 'fast' and
'spread' checkpoint modes. Right now, pg_basebackup docs only say this:

Sets checkpoint mode to fast or spread (default) (see Section 24.3.3).

which is pretty damn useless, when you're investigating an issue. And
the referenced section (Making a Base Backup Using the Low Level API)
does not clearly explain how this maps to pg_start_backup(_,?).

What about adding a paragraph into pg_basebackup docs, explaining that
with 'fast' it does immediate checkpoint, while with 'spread' it'll wait
for a spread checkpoint.

regards

-- Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-02-17 23:24:12 logical replication access control patches
Previous Message Stephen Frost 2017-02-17 22:21:41 Re: SUBSCRIPTIONS and pg_upgrade