Re: BUG #14999: pg_rewind corrupts control file global/pg_control

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, TipTop Labs <office(at)tiptop-labs(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Date: 2018-04-02 23:27:55
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dmitry Dolgov <9erthalion6(at)gmail(dot)com> writes:
> Ok, I agree. Then yes, this patch can be probably marked as ready.

I started to look over this patch, and soon decided that the
commentary in pg_rewind is impossibly bad :-(. You need to study
libpq_executeFileMap for a long time before you are even sure which side
of the transfer it's executing on; the fact that it's doing a copy *TO*
the remote server is incredibly misleading, and the comments are not
anywhere near good enough to de-confuse somebody coming upon the code for
the first time. Not to mention that the function's name conveys nothing
whatever. Maybe it's not the job of this patch to fix that, but I can't
help thinking that poor design and documentation are a big part of why
this bug exists in the first place.

But enough of that rant. Now that I've more or less wrapped my head
around what's happening, I think this is the core of the problem:
libpq_executeFileMap truncates every non-data file in the local data
directory before it's fetched anything at all from the remote server.
This seems to me to be utterly cavalier and brittle, because ANY FAILURE
AT ALL in the fetch sequence leaves you with a data directory that's been
trashed in whole or in part --- and that sequence could span a good long
time, if there's a lot to copy. The originally complained-of problem
(that pg_control gets nuked) is just the tip of that iceberg, and I'm
afraid that the proposed patch is only removing a single potential
failure mode.

I think that a saner design would involve not truncating any particular
target file until we first get some data to write to it. Christian's
original patch shown at
seems to be headed in that direction, in that it forces ordering of the
data fetch (which is likely a good idea anyway to ensure locality of
reference) and then performs the truncation when we get the first chunk
of data for a file.

Michael's proposal to have separate error handling for data and config
files is not bad in itself, and it does improve the behavior for one
foreseeable trouble case. But I think we need the other thing as well,
or some variant of it, before we can regard pg_rewind as anything except
a chainsaw with the safety catch missing.

regards, tom lane

In response to


Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-04-02 23:31:09 BUG #15141: Faulty ISO 8601 parsing
Previous Message Bruce Momjian 2018-04-02 20:24:02 Re: BUG #15112: Unable to run pg_upgrade with earthdistance extension

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2018-04-03 00:03:39 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Craig Ringer 2018-04-02 23:27:35 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS