Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-19 10:21:06
Message-ID: CA+Tgmoa6tRN-dDAEygmmr9K5HiiZ6xt=g0e-3E7o=TGAuAwm+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I have my doubts about this actually addressing gitlab-like mistakes,
> though, because it's a helluva jump from "It's waiting and not doing
> anything," to "We need to remove the datadir." (One of the reasons being
> that non-empty directory is a local issue, and there's no reason why the
> tool should wait instead of just reporting an error.)

It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored. I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss. However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing. That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2017-02-19 10:22:54 Re: Index corruption with CREATE INDEX CONCURRENTLY
Previous Message Robert Haas 2017-02-19 10:15:51 Re: Does having a NULL column automatically exclude the table from the tupleDesc cache?