Re: WAL consistency check facility

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-09-11 20:06:35
Message-ID: CAGz5QCLC=0jbmGag-BaRPYAAVu9CN0b-scOEPeCGBe6Rhh8YVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello,

Based on the previous discussions, I've modified the existing patch.

>+ void (*rm_checkConsistency) (XLogReaderState *record);
>All your _checkConsistency functions share the same pattern, in short
>they all use a for loop for each block, call each time
>XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
>You would get a reduction by a couple of hundreds of lines by having a
>smarter refactoring. And to be honest, if I look at your patch what I
>think is the correct way of doing things is to add to the rmgr not
>this check consistency function, but just a pointer to the masking
>function.
+1. In rmgrlist, I've added a pointer to the masking function for each rmid.
A common function named checkConsistency calls these masking functions
based on their rmid and does comparison for each block.

>> - If WAL consistency check is enabled for a rmgrID, we always include
>> the backup image in the WAL record.
>
>What happens if wal_consistency has different settings on a standby
>and its master? If for example it is set to 'all' on the standby, and
>'none' on the master, or vice-versa, how do things react? An update of
>this parameter should be WAL-logged, no?
If wal_consistency is enabled for a rmid, standby will always check whether
backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
(I guess Amit and Robert also suggested the same in the thread)
Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
image is required during redo. When we decode a wal record, has_image
flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.

>+ if (pg_strcasecmp(tok, "Heap2") == 0)
>+ {
>+ newwalconsistency[RM_HEAP2_ID] = true;
>+ }
>Thinking more about it, I guess that we had better change the
>definition list of rmgrs in rmgr.h and get something closer to
>RmgrDescData that pg_xlogdump has to avoid all this stanza by
>completing it with the name of the rmgr. The only special cases that
>this code path would need to take care of would be then 'none' and
>'all'. You could do this refactoring on top of the main patch to
>simplify it as it is rather big (1.7k lines).
I've modified it exactly like pg_xlogdump does. Additionally, it checks
whether masking function is defined for the rmid or not. Hence, in future,
if we want to include any other rmid for wal consistency check, we just need
to define its masking function.

>> - In recovery tests (src/test/recovery/t), I've added wal_consistency
>> parameter in the existing scripts. This feature doesn't change the
>> expected output. If there is any inconsistency, it can be verified in
>> corresponding log file.
>
>I am afraid that just generating a WARNING message is going to be
>useless for the buildfarm. If we want to detect errors, we could for
>example have an additional GUC to trigger an ERROR or a FATAL, taking
>down the cluster, and allowing things to show in red on a platform.
For now, I've kept this as a WARNING message to detect all inconsistencies
at once. Once, the patch is finalized, I'll modify it as an ERROR message.

Thoughts?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
walconsistency_v7_base_commit_ID_40b449a.patch text/x-patch 37.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2016-09-11 20:52:30 Re: [REVIEW] Tab Completion for CREATE DATABASE ... TEMPLATE ...
Previous Message Tom Lane 2016-09-11 18:25:41 Re: [REVIEW] Tab Completion for CREATE DATABASE ... TEMPLATE ...