Re: WAL consistency check facility

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: WAL consistency check facility
Date: 2017-02-08 20:56:00
Message-ID: CA+TgmoZyFW5vKc=Yi_r1omn8Py54P3HHvAwt_Ne=KZABmy6u4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> Thank you Robert for the review. Please find the updated patch in the
> attachment.

I have committed this patch after fairly extensive revisions:

* Rewrote the documentation to give some idea what the underlying
mechanism of operation of the feature is, so that users who choose to
enable this will hopefully have some understanding of what they've
turned on.
* Renamed "char *page" arguments to "char *pagedata" and "Page page",
because tempPage doesn't seem to be to be any better a name than
page_norm.
* Moved bufmask.c to src/backend/access/common, because there's no
code in src/backend/storage/buffer that knows anything about the
format of pages; that is the job of AMs, hence src/backend/access.
* Improved some comments in bufmask.c
* Removed consistencyCheck_is_enabled in favor of determining which
RMs support masking by the presence of absence of an rm_mask function.
* Removed assertion in checkXLogConsistency that consistency checking
is enabled for the RM; that's trivially false if
wal_consistency_checking is not the same on the master and the
standby. (Note that quite apart from the issue of whether this
function should exist at all, adding it to a header file after the
closing #endif guard is certainly not right.)
* Changed checkXLogConsistency to use RBM_NORMAL_NO_LOG instead of
RBM_NORMAL. I'm not sure if there are any cases where this makes a
difference, but it seems safer.
* Changed checkXLogConsistency to skip pages whose LSN is newer than
that of the record. Without this, if you shut down recovery and
restart it, it complains of inconsistent pages and dies. (I'm not
sure this is the only scenario that needs to be covered; it would be
smart to do more testing of restarting the standby.)
* Made wal_consistency_checking a developer option instead of a WAL
option. Even though it CAN be used in production, we don't
particularly want to encourage that; enabling WAL consistency checking
has a big performance cost and makes your system more fragile not less
-- a WAL consistency failure causes your standby to die a hard death.
(Maybe there should be a way to suppress consistency checking on the
standby -- but I think not just by requiring wal_consistency_checking
on both ends. Or maybe we should just downgrade the FATAL to WARNING;
blowing up the standby irrevocably seems like poor behavior.)
* Coding style improvement in check_wal_consistency_checking.
* Removed commas in messages added to pg_xlogdump; those didn't look
good to me, on further review.
* Comment improvements in xlog_internal.h and xlogreader.h

I also bumped XLOG_PAGE_MAGIC (which is normally done by the
committer, not the patch author, so I wasn't expecting that to be in
the patch as submitted).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-08 21:12:54 Re: Cannot shutdown subscriber after DROP SUBSCRIPTION
Previous Message Tom Lane 2017-02-08 20:52:07 Re: Patch: Avoid precision error in to_timestamp().