Re: WAL consistency check facility

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-09-07 10:22:47
Message-ID: CAGz5QC+gvEn3tgAXsfPGaCiBKOEv+3wisBj4g0F-dxoj_E_Ltg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

As per the earlier discussions, I've attached the updated patch for
WAL consistency check feature. This is how the patch works:

- If WAL consistency check is enabled for a rmgrID, we always include
the backup image in the WAL record.
- I've extended the RmgrTable with a new function pointer
rm_checkConsistency, which is called after rm_redo. (only when WAL
consistency check is enabled for this rmgrID)
- In each rm_checkConsistency, both backup pages and buffer pages are
masked accordingly before any comparison.
- In postgresql.conf, a new guc variable named 'wal_consistency' is
added. Default value of this variable is 'None'. Valid values are
combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist,
BRIN, Generic and XLOG. It can also be set to 'All' to enable all the
values.
- In recovery tests (src/test/recovery/t), I've added wal_consistency
parameter in the existing scripts. This feature doesn't change the
expected output. If there is any inconsistency, it can be verified in
corresponding log file.

Results
------------------------

I've tested with installcheck and installcheck-world in master-standby
set-up. Followings are the configuration parameters.

Master:
wal_level = replica
max_wal_senders = 3
wal_keep_segments = 4000
hot_standby = on
wal_consistency = 'All'

Standby:
wal_consistency = 'All'

I got two types of inconsistencies as following:

1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
redo, only BTP_DELETED flag is set in buffer page. I assume that we
should clear all btpo_flags before setting BTP_DELETED in
_bt_unlink_halfdead_page().

2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
REVMAP page. This happens only for two cases. I'm not sure what the
reason can be.

I haven't done sufficient tests yet to measure the overhead of this
modification. I'll do that next.

Thanks to Amit Kapila, Dilip Kumar and Robert Haas for their off-line
suggestions.

Thoughts?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

On Thu, Sep 1, 2016 at 11:34 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Indeed, it had occurred to me that we might not even want to compile
>> this code into the server unless WAL_DEBUG is defined; after all, how
>> does it help a regular user to detect that the server has a bug? Bug
>> or no bug, that's the code they've got. But on further reflection, it
>> seems like it could be useful: if we suspect a bug in the redo code
>> but we can't reproduce it here, we could ask the customer to turn this
>> option on to see whether it produces logging indicating the nature of
>> the problem. However, because of the likely expensive of enabling the
>> feature, it seems like it would be quite desirable to limit the
>> expense of generating many extra FPWs to the affected rmgr. For
>> example, if a user has a table with a btree index and a gin index, and
>> we suspect a bug in GIN, it would be nice for the user to be able to
>> enable the feature *only for GIN* rather than paying the cost of
>> enabling it for btree and heap as well.[2]
>
> Yes, that would be rather a large advantage.
>
> I think that there really is no hard distinction between users and
> hackers. Some people will want to run this in production, and it would
> be a lot better if performance was at least not atrocious. If amcheck
> couldn't do the majority of its verification with only an
> AccessShareLock, then users probably just couldn't use it. Heroku
> wouldn't have been able to use it on all production databases. It
> wouldn't have mattered that the verification was no less effective,
> since the bugs it found would simply never have been observed in
> practice.
>
> --
> Peter Geoghegan

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
walconsistency_v6.patch text/x-patch 79.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-09-07 10:28:56 Re: Declarative partitioning - another take
Previous Message Amit Kapila 2016-09-07 10:08:21 Re: Write Ahead Logging for Hash Indexes