| From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> | 
|---|---|
| To: | TipTop Labs <office(at)tiptop-labs(dot)com> | 
| Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> | 
| Subject: | Re: BUG #14999: pg_rewind corrupts control file global/pg_control | 
| Date: | 2018-01-15 07:25:37 | 
| Message-ID: | 20180115072537.GC1724@paquier.xyz | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
On Fri, Jan 05, 2018 at 04:36:22AM +0100, TipTop Labs wrote:
>> On Jan 5, 2018, at 2:26 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Fri, Jan 5, 2018 at 5:06 AM, PG Bug reporting form
>> <noreply(at)postgresql(dot)org> wrote:
>>> I have encountered a bug in PostgreSQL 10.1: when the target directory for
>>> pg_rewind contains a read-only file (e.g. server.key), pg_rewind exits with
>>> "could not open target file" (legitimate) and corrupts the control file
>>> global/pg_control to size 0 (bug). From now on, pg_rewind always exits with
>>> "unexpected control file size 0, expected 8192" and a restore from
>>> pg_basebackup is needed.
>> 
>> Likely that's reproducible down to 9.5 where pg_rewind has been
>> introduced. I agree that we should do better with failure handling
>> here. Corrupting the control file is not cool.
> 
> I can already confirm that this also occurs with PostgreSQL 9.6.
As far as I can see, this happens because of the way the 'remote' mode
of pg_rewind considers a set of files to truncate or not. Attached is a
patch which does things in a more correct way. The key point here is to
make the difference between a relation file and a configuration file
when building the filemap for copying a file in full. When a
configuration file is not readable, then trying to open it should not be 
a failure. When using a relation file, there should be failures. There
are still two things I am not completely happy about:
- pg_control is considered as a configuration file with this patch,
however we should fail it pg_control is not readable. In practice I
guess that this should not happen, and the patch produces a warning
message. I think that we should consider add a special handling for
things like pg_control, filenode.map. etc. so as you get a hard failure
if those are not readable, so they should enter in the category of
FILE_ACTION_COPY_DATA. This needs a bit more thoughts in
process_source_file().
- open_target_file() resets manually errno. This is necessary as this
gets called continuously when processing the same file in remote mode. I
tried as well to make open_target_file and close_target_file one-time
operations for each file, but the result was even more ugly, so I went
up with this solution.
I am adding that to the next CF to not forget about it. This approach
is back-patchable down to 9.5 in my opinion. I have added as well a TAP
test in the patch which is able to reproduce the problem.
Thoughts?
--
Michael
| Attachment | Content-Type | Size | 
|---|---|---|
| rewind-readonly-fix-v1.patch | text/x-diff | 10.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David G. Johnston | 2018-01-15 07:36:40 | BUG #15010: Sequence ID is getting skipped | 
| Previous Message | PG Bug reporting form | 2018-01-15 06:43:41 | BUG #15010: Sequence ID is getting skipped | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2018-01-15 08:09:19 | Re: [HACKERS] postgres_fdw bug in 9.6 | 
| Previous Message | Haribabu Kommi | 2018-01-15 06:51:58 | Re: Enhance pg_stat_wal_receiver view to display connected host |