Re: BUG #14999: pg_rewind corrupts control file global/pg_control

From: TipTop Labs <office(at)tiptop-labs(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Date: 2018-02-26 16:28:53
Message-ID: D2684F73-AE17-453B-ACB5-9CF0A7F6856E@tiptop-labs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

1. I can confirm that your patch is effective also in my Docker-based test setup and with the current REL_10_STABLE code base (i.e. PostgreSQL 10.2).

2. Your patch is more encompassing than the one I had submitted earlier, and it may be the right way to go. It is cleaner but more "complicated" in that it may require enlisting/recognizing all those special files (pg_control, filenode.map, etc). IMO the earlier patch would already/tolerate handle those, because the distinction it makes is not based on whether something is a configuration file, but purely on whether it is writable.

3. Sorry for the late response :)

> On Jan 15, 2018, at 8:25 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>
> On Fri, Jan 05, 2018 at 04:36:22AM +0100, TipTop Labs wrote:
>>> On Jan 5, 2018, at 2:26 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Fri, Jan 5, 2018 at 5:06 AM, PG Bug reporting form
>>> <noreply(at)postgresql(dot)org> wrote:
>>>> I have encountered a bug in PostgreSQL 10.1: when the target directory for
>>>> pg_rewind contains a read-only file (e.g. server.key), pg_rewind exits with
>>>> "could not open target file" (legitimate) and corrupts the control file
>>>> global/pg_control to size 0 (bug). From now on, pg_rewind always exits with
>>>> "unexpected control file size 0, expected 8192" and a restore from
>>>> pg_basebackup is needed.
>>>
>>> Likely that's reproducible down to 9.5 where pg_rewind has been
>>> introduced. I agree that we should do better with failure handling
>>> here. Corrupting the control file is not cool.
>>
>> I can already confirm that this also occurs with PostgreSQL 9.6.
>
> As far as I can see, this happens because of the way the 'remote' mode
> of pg_rewind considers a set of files to truncate or not. Attached is a
> patch which does things in a more correct way. The key point here is to
> make the difference between a relation file and a configuration file
> when building the filemap for copying a file in full. When a
> configuration file is not readable, then trying to open it should not be
> a failure. When using a relation file, there should be failures. There
> are still two things I am not completely happy about:
> - pg_control is considered as a configuration file with this patch,
> however we should fail it pg_control is not readable. In practice I
> guess that this should not happen, and the patch produces a warning
> message. I think that we should consider add a special handling for
> things like pg_control, filenode.map. etc. so as you get a hard failure
> if those are not readable, so they should enter in the category of
> FILE_ACTION_COPY_DATA. This needs a bit more thoughts in
> process_source_file().
> - open_target_file() resets manually errno. This is necessary as this
> gets called continuously when processing the same file in remote mode. I
> tried as well to make open_target_file and close_target_file one-time
> operations for each file, but the result was even more ugly, so I went
> up with this solution.
>
> I am adding that to the next CF to not forget about it. This approach
> is back-patchable down to 9.5 in my opinion. I have added as well a TAP
> test in the patch which is able to reproduce the problem.
>
> Thoughts?
> --
> Michael
> <rewind-readonly-fix-v1.patch>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-02-26 17:48:29 Re: BUG #15091: to_number() returns incorrect value
Previous Message PG Bug reporting form 2018-02-26 15:45:42 BUG #15092: pg_basebackup directory checking

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2018-02-26 16:32:12 Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Previous Message Nikita Glukhov 2018-02-26 15:53:23 Re: SQL/JSON: JSON_TABLE