Re: WAL consistency check facility

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: WAL consistency check facility
Date: 2016-12-21 17:23:16
Message-ID: CA+TgmoYVRmKySQLiv-+M4ho_JcN_de6P72CLUhFFnAJOVkVG5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Moved to CF 2017-01, as no committers have showed up yet :(

Seeing no other volunteers, here I am.

On a first read-through of this patch -- I have not studied it in
detail yet -- this looks pretty good to me. One concern is that this
patch adds a bit of code to XLogInsert(), which is a very hot piece of
code. Conceivably, that might produce a regression even when this is
disabled; if so, we'd probably need to make it a build-time option. I
hope that's not necessary, because I think it would be great to
compile this into the server by default, but we better make sure it's
not a problem. A bulk load into an existing table might be a good
test case.

Aside from that, I think the biggest issue here is that the masking
functions are virtually free of comments, whereas I think they should
have extensive and detailed comments. For example, in heap_mask, you
have this:

+ page_htup->t_infomask =
+ HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
+ HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;

For something like this, you could write "We want to ignore
differences in hint bits, since they can be set by SetHintBits without
emitting WAL. Force them all to be set so that we don't notice
discrepancies." Actually, though, I think that you could be a bit
more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
HEAP_XMIN_FROZEN, so maybe what you should do is clear
HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
one is set but not both.

Anyway, leaving that aside, I think every single change that gets
masked in every single masking routine needs a similar comment,
explaining why that change can happen on the master without also
happening on the standby and hopefully referring to the code that
makes that unlogged change.

I think wal_consistency_checking, as proposed by Peter, is better than
wal_consistency_check, as implemented.

Having StartupXLOG() call pfree() on the masking buffers is a waste of
code. The process is going to exit anyway.

+ "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u",

Primary error messages aren't capitalized.

+ if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
+ {
+ /* Caller specified a bogus block_id. Do nothing. */
+ continue;
+ }

Why would the caller do something so dastardly?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-12-21 17:27:31 Re: Getting rid of "unknown error" in dblink and postgres_fdw
Previous Message Joe Conway 2016-12-21 17:22:25 Re: Getting rid of "unknown error" in dblink and postgres_fdw