Re: BUG #14999: pg_rewind corrupts control file global/pg_control

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: office(at)tiptop-labs(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Date: 2018-01-05 01:26:37
Message-ID: CAB7nPqTyq3cjphJmoCPc0n8r90O4j1BpZC2BMaj8dKZ7g0-mtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Jan 5, 2018 at 5:06 AM, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> I have encountered a bug in PostgreSQL 10.1: when the target directory for
> pg_rewind contains a read-only file (e.g. server.key), pg_rewind exits with
> "could not open target file" (legitimate) and corrupts the control file
> global/pg_control to size 0 (bug). From now on, pg_rewind always exits with
> "unexpected control file size 0, expected 8192" and a restore from
> pg_basebackup is needed.

Likely that's reproducible down to 9.5 where pg_rewind has been
introduced. I agree that we should do better with failure handling
here. Corrupting the control file is not cool.

> A patch for branch REL_10_STABLE of repository
> https://github.com/postgres/postgres, a README, and Dockerfiles for
> demonstrating both bug and patch are available from
> https://github.com/tiptop-labs/postgres-patches .

You may not know when github.com is gone, which would cause the data
you are attaching here to go away. What looks interesting is
pg_rewind.patch, which is what you are proposing as a fix, right? I am
attaching it here for PG archives.

- open_target_file(filename, false);
+ /* Trunc target file for action FILE_ACTION_COPY. */
+ open_target_file(filename, chunkoff == 0);

write_target_range(chunk, chunkoff, chunksize);
Hm. Let me think more about that as there are quite a few
distributions that link to SSL files that cannot be written, and
pg_rewind copies all configuration files in full.

- "FROM fetchchunks\n";
+ "FROM fetchchunks\n"
+ "ORDER BY path, begin\n";
If this is aimed at improving the performance of pg_rewind by making
chunk writes more sequential, it should be a different patch. I would
see value in that. If this is aimed to change the order of the files
to avoid the write corruption, this is wrong.

I would imagine that you could see something similar with the offline
mode, haven't tested yet though.
--
Michael

Attachment Content-Type Size
pg_rewind.patch application/octet-stream 1.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message TipTop Labs 2018-01-05 03:36:22 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Alexander Korotkov 2018-01-04 21:25:52 Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-05 01:45:37 Re: bug? import foreign schema forgets to import column description
Previous Message David Rowley 2018-01-05 01:13:39 Re: [HACKERS] path toward faster partition pruning