Re: BUG #14999: pg_rewind corrupts control file global/pg_control

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: TipTop Labs <office(at)tiptop-labs(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Date: 2018-04-07 15:51:32
Message-ID: 15971.1523116292@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thinking about this some more, I wondered if useful behavior with
read-only config files would be to check whether they have the same
contents on both servers, and give a warning or error if not.

Now, that would only be useful if you suppose that they actually should be
the same on both. For the particular case of server private keys, maybe
they typically would be different. So I'm not real sure about this idea,
but wanted to toss it out for discussion.

In any case, after another look at the code, it seems to me that it'd be
pretty easy to get pg_rewind to notice at the start whether a target file
it plans to overwrite is read-only: process_source_file is already doing
an lstat per file, so I think a permissions check there wouldn't add much
overhead. So at that point we could either bail out without damage, or
decide to skip the file. We could still lose if permissions change midway
through the run, but that's the sort of situation I think is OK to fail
in.

Also, I'm having second thoughts about the usefulness of adding an ORDER
BY to the fetch query just on (untested) performance grounds. It looks
to me like the chunks within a file already will be inserted into the
fetchchunks table in order, and I think within-file order is all that's
worth worrying about. In principle the remote server might read out the
rows in some other order than we stored them, but since fetchchunks is a
temp table and not subject to synchronize_seqscans, I don't think we
really need to worry about that.

Furthermore, the patch as given has got a serious performance gotcha:

sql =
"SELECT path, begin,\n"
" pg_read_binary_file(path, begin, len, true) AS chunk\n"
- "FROM fetchchunks\n";
+ "FROM fetchchunks\n"
+ "ORDER BY path, begin\n";

While 9.6 and later will do this in a sane way, older versions will
execute pg_read_binary_file() ahead of the sort step, like so:

# explain verbose SELECT path, begin, pg_read_binary_file(path, begin, len, true) AS chunk FROM fetchchunks ORDER BY path, begin;
QUERY PLAN
-------------------------------------------------------------------------------------
Sort (cost=74639.34..75889.41 rows=500025 width=44)
Output: path, begin, (pg_read_binary_file(path, begin, (len)::bigint, true))
Sort Key: fetchchunks.path, fetchchunks.begin
-> Seq Scan on pg_temp_2.fetchchunks (cost=0.00..11925.38 rows=500025 width=44)
Output: path, begin, pg_read_binary_file(path, begin, (len)::bigint, true)
(5 rows)

with the effect that we're passing the entire contents of the source data
directory (or at least, as much of it as pg_rewind needs to read) through
the sort. Yipes. So it'd be safer to do it like this:

# explain verbose SELECT path, begin, pg_read_binary_file(path, begin, len, true) AS chunk FROM (select * from fetchchunks ORDER BY path, begin) ss;
QUERY PLAN
---------------------------------------------------------------------------------------------
Subquery Scan on ss (cost=72139.22..80889.66 rows=500025 width=44)
Output: ss.path, ss.begin, pg_read_binary_file(ss.path, ss.begin, (ss.len)::bigint, true)
-> Sort (cost=72139.22..73389.28 rows=500025 width=44)
Output: fetchchunks.path, fetchchunks.begin, fetchchunks.len
Sort Key: fetchchunks.path, fetchchunks.begin
-> Seq Scan on pg_temp_2.fetchchunks (cost=0.00..9425.25 rows=500025 width=44)
Output: fetchchunks.path, fetchchunks.begin, fetchchunks.len
(7 rows)

But the real bottom line here is I don't feel a need to touch the fetch
query at all without some actual performance measurements proving there's
a benefit.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-04-07 22:22:58 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Tom Lane 2018-04-07 14:14:18 Re: BUG #15145: date time default value issues

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-04-07 16:06:23 Re: [HACKERS] pgbench - allow to store select results into variables
Previous Message Andres Freund 2018-04-07 15:48:45 Re: [HACKERS] path toward faster partition pruning