Re: WAL consistency check facility

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-09-13 09:07:09
Message-ID: CAGz5QCKHWQCUrCbT4vjMgYPoOJJjoi2_dT=5eZVWeL=8nZT76Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>> - If WAL consistency check is enabled for a rmgrID, we always include
>>>> the backup image in the WAL record.
>>>
>>>What happens if wal_consistency has different settings on a standby
>>>and its master? If for example it is set to 'all' on the standby, and
>>>'none' on the master, or vice-versa, how do things react? An update of
>>>this parameter should be WAL-logged, no?
>>
>> If wal_consistency is enabled for a rmid, standby will always check whether
>> backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
>> (I guess Amit and Robert also suggested the same in the thread)
>> Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
>> BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
>> image is required during redo. When we decode a wal record, has_image
>> flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.
>
>Ah I see. But do we actually store the status in the record itself,
>then at replay we don't care of the value of wal_consistency at
>replay. That's the same concept used by wal_compression. So shouldn't
>you have more specific checks related to that in checkConsistency? You
>actually don't need to check for anything in xlogreader.c, just check
>for the consistency if there is a need to do so, or do nothing.
I'm sorry, but I don't quite follow you here. If a wal record contains
an image, has_image should be set since it helps decoding the
record. But, during redo if XLogRecHasBlockImage() returns true, i.e.,
has_image is set, then it always restore the block. But, in our case,
a record can have a backup image which should not be restored. So, we need
to decide two things:
1. Does a record contain backup image? (required for decoding the record)
2. If it has an image, should we restore it during redo?
I think we sould decide these in DecodeXLogRecord() only. BKPBLOCK_HAS_IMAGE
answers the first question whereas BKPIMAGE_IS_REQUIRED_FOR_REDO
answers the second one. In DecodeXLogRecord(), we check that
BKPBLOCK_HAS_IMAGE should be set if wal_consistency is enabled for
this record. The flag has_image is set to
BKPIMAGE_IS_REQUIRED_FOR_REDO which is later used to decide whether we
want to restore a block or not.

> For now, I've kept this as a WARNING message to detect all inconsistencies
> at once. Once, the patch is finalized, I'll modify it as an ERROR message.

Or say FATAL. This way the server is taken down.

> Thoughts?
+1. I'll do that.

>A couple of extra thoughts:
>1) The routines of each rmgr are located in a dedicated file, for
>example GIN stuff is in ginxlog.c, etc. It seems to me that it would
>be better to move each masking function where it should be instead
>being centralized. A couple of routines need to be centralized, so I'd
>suggest putting them in a new file, like xlogmask.c, xlog because now
>this is part of WAL replay completely, including the lsn, the hint
>bint and the other common routines.
Sounds good. But, I think that the file name for common masking routines
should be as bufmask.c since we are masking the buffers only.

>2) Regarding page comparison:
>+/*
>+ * Compare the contents of two pages.
>+ * If the two pages are exactly same, it returns BLCKSZ. Otherwise,
>+ * it returns the location where the first mismatch has occurred.
>+ */
>+int
>+comparePages(char *page1, char *page2)
>We could just use memcpy() here. compareImages was useful to get a
>clear image of what the inconsistencies were, but you don't do that
>anymore.
memcmp(), right?

>5)
>+ has_image = record->blocks[block_id].has_image;
>+ record->blocks[block_id].has_image = true;
>+ if (!RestoreBlockImage(record, block_id, old_page))
>+ elog(ERROR, "failed to restore block image");
>+ record->blocks[block_id].has_image = has_image;
>Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO?
Sorry, I completely missed this.

>6)
>+ /*
>+ * Remember that, if WAL consistency check is enabled for
>the current rmid,
>+ * we always include backup image with the WAL record.
>But, during redo we
>+ * restore the backup block only if needs_backup is set.
>+ */
>+ if (needs_backup)
>+ bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
>This should use wal_consistency[rmid]?
needs_backup is set when XLogRecordAssemble decides that backup image
should be included in the record for redo purpose. This image will be
restored during
redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included
image should be restored during redo(or has_image should be set or not).

>7) This patch has zero documentation. Please add some. Any human being
>on this list other than those who worked on the first versions
>(Heikki, Simon and I?) is going to have a hard time to review this
>patch in details moving on if there is no reference to tell what this
>feature does for the user...
>
>This patch is going to the good direction, but I don't think it's far
>from being ready for commit yet. So I am going to mark it as returned
>with feedback if there are no objections
I think only major change that this patch needs a proper and detailed
documentation. Other than that there are very minor changes which can
be done quickly. Right?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2016-09-13 10:35:01 Nested loop join condition does not get pushed down to foreign scan
Previous Message Heikki Linnakangas 2016-09-13 08:44:01 Re: Supporting SJIS as a database encoding