Re: BUG #15346: Replica fails to start after the crash

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-28 12:59:45
Message-ID: 20180828125945.GG29157@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 28, 2018 at 08:23:11AM -0400, Stephen Frost wrote:
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
>> Uh, where is that "control file last" bit coming from?
>
> pg_basebackup copies it last. The comments should probably be improved
> as to *why* but my recollection is that it's, at least in part, to
> ensure the new cluster can't be used until it's actually a complete
> backup.

What we have now is mainly in basebackup.c. See 8366c780 which
introduced that. Stephen has that right, as we cannot rely on an
end-backup record when taking a backup from a standby, copying the
control file last ensures that the consistent point should be late
enough that no other pages are inconsistent. Even with that, I think
that there is still a small race condition but I cannot put my finger on
it now. I agree that the current comments do a bad job as to why this
happens. That's actually something I discovered when discussing what
has resulted in f267c1c2.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2018-08-28 14:23:20 Re: BUG #15352: postgresql FDW error "ERROR: ORDER BY position 0 is not in select list"
Previous Message PG Bug reporting form 2018-08-28 12:56:50 BUG #15356: Inconsistent documentation about CREATE TYPE

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-08-28 13:02:56 Re: pg_verify_checksums failure with hash indexes
Previous Message Stephen Frost 2018-08-28 12:32:14 Re: Would it be possible to have parallel archiving?