Re: WAL consistency check facility

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-09-08 07:50:40
Message-ID: CAB7nPqRSMr3UtLYzAMfsH__7yZs6Fn_RksjL+rY9GPNfhtVoGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 7, 2016 at 7:22 PM, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> Hello,

Could you avoid top-posting please? More reference here:
http://www.idallen.com/topposting.html

> - If WAL consistency check is enabled for a rmgrID, we always include
> the backup image in the WAL record.

What happens if wal_consistency has different settings on a standby
and its master? If for example it is set to 'all' on the standby, and
'none' on the master, or vice-versa, how do things react? An update of
this parameter should be WAL-logged, no?

> - I've extended the RmgrTable with a new function pointer
> rm_checkConsistency, which is called after rm_redo. (only when WAL
> consistency check is enabled for this rmgrID)
> - In each rm_checkConsistency, both backup pages and buffer pages are
> masked accordingly before any comparison.

This leads to heavy code duplication...

> - In postgresql.conf, a new guc variable named 'wal_consistency' is
> added. Default value of this variable is 'None'. Valid values are
> combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist,
> BRIN, Generic and XLOG. It can also be set to 'All' to enable all the
> values.

Lower-case is the usual policy for parameter values for GUC parameters.

> - In recovery tests (src/test/recovery/t), I've added wal_consistency
> parameter in the existing scripts. This feature doesn't change the
> expected output. If there is any inconsistency, it can be verified in
> corresponding log file.

I am afraid that just generating a WARNING message is going to be
useless for the buildfarm. If we want to detect errors, we could for
example have an additional GUC to trigger an ERROR or a FATAL, taking
down the cluster, and allowing things to show in red on a platform.

> Results
> ------------------------
>
> I've tested with installcheck and installcheck-world in master-standby
> set-up. Followings are the configuration parameters.

So you tested as well the recovery tests, right?

> I got two types of inconsistencies as following:
>
> 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
> page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
> redo, only BTP_DELETED flag is set in buffer page. I assume that we
> should clear all btpo_flags before setting BTP_DELETED in
> _bt_unlink_halfdead_page().

The page is deleted, it does not matter, so you could just mask all
the flags for a deleted page...
[...]
+ /*
+ * Mask everything on a DELETED page.
+ */
+ if (((BTPageOpaque) PageGetSpecialPointer(page_norm))->btpo_flags
& BTP_DELETED)
And that's what is happening.

> 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
> REVMAP page. This happens only for two cases. I'm not sure what the
> reason can be.

Hm? This smells like a block reference bug. What are the cases you are
referring to?

> I haven't done sufficient tests yet to measure the overhead of this
> modification. I'll do that next.

I did a first pass on your patch, and I think that things could be
really reduced. There is much code duplication, but see below for the
details..

#include "access/xlogutils.h"
-
+#include "storage/bufmask.h"
I know that I am a noisy one on the matter, but please double-check
for such useless noise in your patch. And there is not only one.

+ newwalconsistency = (bool *) guc_malloc(ERROR,(RM_MAX_ID + 1)*sizeof(bool));
This spacing is not project-style. You may want to go through that:
https://www.postgresql.org/docs/devel/static/source.html

+$node_master->append_conf(
+ 'postgresql.conf', qq(
+wal_consistency = 'All'
+));
Instead of duplicating that 7 times, you could just do it once in the
init() method of PostgresNode.pm. This really has meaning if enabled
by default.

+ /*
+ * Followings are the rmids which can have backup blocks.
+ * We'll enable this feature only for these rmids.
+ */
+ newwalconsistency[RM_HEAP2_ID] = true;
+ newwalconsistency[RM_HEAP_ID] = true;
+ newwalconsistency[RM_BTREE_ID] = true;
+ newwalconsistency[RM_HASH_ID] = true;
+ newwalconsistency[RM_GIN_ID] = true;
+ newwalconsistency[RM_GIST_ID] = true;
+ newwalconsistency[RM_SEQ_ID] = true;
+ newwalconsistency[RM_SPGIST_ID] = true;
+ newwalconsistency[RM_BRIN_ID] = true;
+ newwalconsistency[RM_GENERIC_ID] = true;
+ newwalconsistency[RM_XLOG_ID] = true;
Here you can just use MemSet with RM_MAX_ID and simplify this code maintenance.

+ for(i = 0; i < RM_MAX_ID + 1 ; i++)
+ newwalconsistency[i] = false;
+ break;
Same here you can just use MemSet.

+ else if (pg_strcasecmp(tok, "NONE") == 0)
[...]
+ else if (pg_strcasecmp(tok, "ALL") == 0)
It seems to me that using NONE or ALL with any other keywords should
not be allowed.

+ if (pg_strcasecmp(tok, "Heap2") == 0)
+ {
+ newwalconsistency[RM_HEAP2_ID] = true;
+ }
Thinking more about it, I guess that we had better change the
definition list of rmgrs in rmgr.h and get something closer to
RmgrDescData that pg_xlogdump has to avoid all this stanza by
completing it with the name of the rmgr. The only special cases that
this code path would need to take care of would be then 'none' and
'all'. You could do this refactoring on top of the main patch to
simplify it as it is rather big (1.7k lines).

+ if (inconsistent_loc < BLCKSZ)
+ elog(WARNING,
+ "Inconsistent page (at byte %u) found, rel %u/%u/%u, "
+ "forknum %u, blkno %u", inconsistent_loc,
+ rnode.spcNode, rnode.dbNode, rnode.relNode,
+ forknum, blkno);
+ else
+ elog(DEBUG1,
+ "Consistent page found, rel %u/%u/%u, "
+ "forknum %u, blkno %u",
+ rnode.spcNode, rnode.dbNode, rnode.relNode,
+ forknum, blkno);
This is going to be very chatty. Perhaps the elog level should be raised?

-#define SEQ_MAGIC 0x1717
-
-typedef struct sequence_magic
-{
- uint32 magic;
-} sequence_magic;
You do not need this refactoring anymore.

+ void (*rm_checkConsistency) (XLogReaderState *record);
All your _checkConsistency functions share the same pattern, in short
they all use a for loop for each block, call each time
XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
You would get a reduction by a couple of hundreds of lines by having a
smarter refactoring. And to be honest, if I look at your patch what I
think is the correct way of doing things is to add to the rmgr not
this check consistency function, but just a pointer to the masking
function.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-09-08 08:03:42 Re: Optimization for lazy_scan_heap
Previous Message Vik Fearing 2016-09-08 07:22:40 Re: Sample configuration files