Re: WAL consistency check facility

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-10-28 06:05:02
Message-ID: CAB7nPqR=OcojLCP=1Ho6Zo312CKzUZU8d4aJO+VvpUYV-waU_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 27, 2016 at 5:08 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Seeing nothing happening, I have moved the patch to next CF as there
>> is a new version, but no reviews for it.
>
> Just a note for anybody potentially looking at this patch. I am
> currently looking at it in depth, and will post a new version of the
> patch in a couple of days with review comments. Thanks.

And here we go. Here is a review as well as a large brush-up for this
patch. A couple of things:
- wal_consistency is using a list of RMGRs, at the cost of being
PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
have been thinking hard about that, and still I don't see the point).
It is rather easy to for example default it to false, and enable it to
true to check if a certain code path is correctly exercised or not for
WAL consistency. Note that this simplification reduces the patch size
by 100~150 lines. I know, I know, I'd expect some complains about
that....
- Looking for wal_consistency at replay has no real value. What if on
a standby the parameter value is inconsistent than the one on the
master? This logic adds a whole new level of complications and
potential bugs. So instead my suggestion is to add a marker at WAL
record level to check if this record should be checked for consistency
at replay or not. This is also quite flexible if you think about it,
the standby is made independent of the WAL generated on the master and
just applies, or checks what it sees is fit checking for. The best
match here is to add a flag for XLogRecord->xl_info and make use of
one of the low 4 bits and only one is used now for
XLR_SPECIAL_REL_UPDATE. An interesting side effect of this approach is
that callers of XLogInsert can set XLR_CHECK_CONSISTENCY to enforce a
consistency check even if wal_consistency is off. It is true that we
could register such a data via XLogRegisterBuffer() instead, though
the 4 bits with the BKPBLOCK_* flags are already occupied so that
would induce a record penalty length and I have a hard time believing
that one would like to check the consistency of a record in
particular.
- Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the
block definition is sort of weird because we want to know if
consistency should be checked at a higher level.
- in maskPage, the new rmgr routine, there is no need for the info and
blkno arguments. info is not used at all to begin with. blkno is used
for gin pages to detect meta pages but this can be guessed using the
opaque pointer. For heap pages and speculative inserts, masking the
blkno would be fine. That's not worth it.
- Instead of palloc'ing the old and new pages to compare, it would be
more performant to keep around two static buffers worth of BLCKSZ and
just use that. This way there is no need as well to perform any palloc
calls in the masking functions, limiting the risk of errors (those
code paths had better avoid errors IMO). It would be also less costly
to just pass to the masking function a pointer to a buffer of size
BLCKSZ and just do the masking on it.
- The masking routine names can be more generic, like XXX_mask(char
*page). No need to say page, we already know they work on it via the
argument provided.
- mask_xlog_page and mask_generic_page are useless as the block
restored comes directly from a FPW, so you are comparing basically a
FPW with itself.
- In checkConsistency, there is no need to allocate the old page. As
RestorebackupImage stores the data in an already allocated buffer, you
can reuse the same location as the buffer masked afterwards.
- Removed comparePages(), using memcmp instead for simplicity(). This
does not show up the exact location of the inconsistency, still that
won't be a win as there could be more than one inconsistency across a
page. So this gives an invitation to user to look at the exact
context. memcmp can be used anyway to understand where is the
inconsistency if need be.
- I have noticed that mask_common_page is meaningfull just for the
sequence RMGR, and just that does not justify its existence so I
ripped it off.
- PostgresNode.pm enables wal_consistency. Seeing the high amount of
WAL this produces, I feel cold about doing that, the patch does
include it btw...
- Standbys now stop with FATAL when an inconsistency is found. This
makes error detection easier on buildfarm machines.
- A couple of masking functions still use 0xFFFFFF or similar marks.
Those should be replaced by MASK_MARKING. Not done that yet.
- Some of the masking routines should be refined, particularly the
heap and GIn functions. I did not spend time yet to do it.

On top of that, I have done a fair amount of testing, creating
manually some inconsistencies in the REDO routines to trigger failures
on standbys. And that was sort of fun to break things intentionally.

Another fun thing is the large amount of WAL that this generates (!),
so anyone willing to enable that in production would be crazy.
Enabling it for development and/or session is something that would
clearly help.

I am sending back the patch as waiting on author. Attached is what I
have up to now.
--
Michael

Attachment Content-Type Size
walconsistency_v9.patch text/x-diff 39.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-10-28 06:46:31 Re: Proposal: scan key push down to heap [WIP]
Previous Message Haribabu Kommi 2016-10-28 05:55:51 Re: pg_hba_file_settings view patch