Re: Online checksums verification in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-04-03 09:39:11
Message-ID: 20200403093911.GB1206@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 03, 2020 at 12:24:50PM +0900, Masahiko Sawada wrote:
> On Sat, 28 Mar 2020 at 21:19, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> The current patch still checks SearchSysCacheExists1() before
> relation_open. Why do we need to call SearchSysCacheExists1() here? I
> think if the given relation doesn't exist, relation_open() will raise
> an error "could not open relation with OID %u".
>
> + /* Open the relation if it exists. */
> + if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> + {
> + relation = relation_open(relid, AccessShareLock);
> + }

Oops yes sorry about that. Fixed.

> > > 8.
> > > + if (PageIsVerified(buffer, blkno))
> > > + {
> > > + /*
> > > + * If the page is really new, there won't by any checksum to be
> > > + * computed or expected.
> > > + */
> > > + *chk_expected = *chk_found = NoComputedChecksum;
> > > + return true;
> > > + }
> > > + else
> > > + {
> > > + /*
> > > + * There's a corruption, but since this affect PageIsNew, we
> > > + * can't compute a checksum, so set NoComputedChecksum for the
> > > + * expected checksum.
> > > + */
> > > + *chk_expected = NoComputedChecksum;
> > > + *chk_found = hdr->pd_checksum;
> > > + }
> > > + return false;
> > >
> > > * I think the 'else' is not necessary here.
> >
> > AFAICT it's, see below.
> >
> > > * Setting *chk_expected and *chk_found seems useless when we return
> > > true. The caller doesn't use them.
> >
> > Indeed, fixed.
>
> The patch still sets values to both?
>
> + if (PageIsVerified(buffer, blkno))
> + {
> + /* No corruption. */
> + *chk_expected = *chk_found = NoComputedChecksum;
> + return true;
> + }

Sorry again, fixed.

> > > * Should we forcibly overwrite ignore_checksum_failure to off in
> > > pg_check_relation()? Otherwise, this logic seems not work fine.
> > >
> > > * I don't understand why we cannot compute a checksum in case where a
> > > page looks like a new page but is actually corrupted. Could you please
> > > elaborate on that?
> >
> > PageIsVerified has a different behavior depending on whether the page looks new
> > or not. If the page looks like new, it only checks that it's indeed a new
> > page, and otherwise try to verify the checksum.
> >
> > Also, pg_check_page() has an assert to make sure that the page isn't (or don't
> > look like) new.
> >
> > So it seems to me that the 'else' is required to properly detect a real or fake
> > PageIsNew, and try to compute checksums only when required.
>
> Thank you for your explanation! I understand.
>
> I thought we can arrange the code to something like:
>
> if (PageIsNew(hdr))
> {
> if (PageIsVerified(hdr))
> {
> *chk_expected = *chk_found = NoComputedChecksum;
> return true;
> }
>
> *chk_expected = NoComputedChecksum;
> *chk_found = hdr->pd_checksum;
> return false;
> }
>
> But since it's not a critical problem you can ignore it.

I like it, so done!

> > > 8.
> > > + {
> > > + {"checksum_cost_page_hit", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
> > > + gettext_noop("Checksum cost for a page found in the buffer cache."),
> > > + NULL
> > > + },
> > > + &ChecksumCostPageHit,
> > > + 1, 0, 10000,
> > > + NULL, NULL, NULL
> > > + },
> > >
> > > * There is no description about the newly added four GUC parameters in the doc.
> > >
> > > * We need to put new GUC parameters into postgresql.conf.sample as well.
> >
> > Fixed both.
> >
> > > * The patch doesn't use checksum_cost_page_hit at all.
> >
> > Indeed, I also realized that while working on previous issues. I removed it
> > and renamed checksum_cost_page_miss to checksum_cost_page.
>
> Perhaps we can use checksum_cost_page_hit when we found the page in
> the shared buffer but it's marked as dirty?

The thing is that when the buffer is dirty, we won't do any additional check,
thus not adding any overhead. What may be needed here is to account for the
locking overhead (in all cases), so that if all (or almost all) the buffers are
dirty and in shared buffers the execution can be throttled. I don't know how
much an issue it can be, but if that's something to be fixes then page_hit
doesn't look like the right answer for that.

> I've read the latest patch and here is random comments:
>
> 1.
> + /*
> + * Add a page miss cost, as we're always reading outside the shared
> + * buffers.
> + */
> + /* Add a page cost. */
> + ChecksumCostBalance += ChecksumCostPage;
>
> There are duplicate comments.

Fixed.

> 2.
> + /* Dirty pages are ignored as they'll be flushed soon. */
> + if (buf_state & BM_DIRTY)
> + checkit = false;
>
> Should we check the buffer if it has BM_TAG_VALID as well here? I
> thought there might be a possibility that BufTableLookup() returns a
> buf_Id but its buffer tag is not valid for example when the previous
> read failed after inserting the buffer tag to the buffer table.

Good point, fixed.

> 3.
> + /* Add a page cost. */
> + ChecksumCostBalance += ChecksumCostPage;
> +
> + return checkit;
> +}
>
> The check_get_buffer() seems to be slightly complex to me but when we
> reached the end of this function we always return true. Similarly, in
> the case where we read the block while holding a partition lock we
> always return true as well. Is my understanding right? If so, it might
> be better to put some assertions.

Yes it's a little bit complex. I used this approach to avoid the need to
release the locks all over the place, but maybe this doesn't really improve
things. I added asserts and comments anyway as suggested, thanks.

> 4.
> @@ -10825,6 +10825,14 @@
> proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}',
> proargnames => '{tablespace,name,size,modification}',
> prosrc => 'pg_ls_tmpdir_1arg' },
> +{ oid => '9147', descr => 'check data integrity for one or all relations',
> + proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
> + prorows => '20', proretset => 't', proparallel => 'r',
> + provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
> + proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
> + proargmodes => '{i,i,o,o,o,o,o}',
> + proargnames =>
> '{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
> + prosrc => 'pg_check_relation' },
>
> Why is the pg_check_relation() is not a strict function? I think
> prostrict can be 'true' for this function and we can drop checking if
> the first argument is NULL.

That's because the fork is still optional. While this could be made mandatory
without much problems, I think we'll eventually want to add a way to check only
a subset of a fork, so it seemed to me that is wasn't worth changing that now.

> 5.
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));
>
> I think we can do memset right before setting values to them, that is,
> after checking (!found_in_sb && !force_lock).

Indeed, done!

> 6.
> +static bool
> +check_buffer(char *buffer, uint32 blkno, uint16 *chk_expected,
> + uint16 *chk_found)
> +{
> + PageHeader hdr = (PageHeader) buffer;
> +
> + Assert(chk_expected && chk_found);
> +
> + if (PageIsNew(hdr))
> + {
> + /*
> + * Check if the page is really new or if there's a corruption that
> + * affected PageIsNew detection. Note that PageIsVerified won't try to
> + * detect checksum corruption in this case, so there's no risk of
> + * duplicated corruption report.
> + */
> + if (PageIsVerified(buffer, blkno))
>
> How about using Page instead of PageHeader? Looking at other codes,
> ISTM we usually pass Page to both PageIsNew() and PageIsVerified().

Agreed, done.

> 7.
> + <entry>
> + <literal><function>pg_check_relation(<parameter>relation</parameter>
> <type>oid</type>[, <parameter>fork</parameter>
> <type>text</type>])</function></literal>.
> + </entry>
>
> +{ oid => '9147', descr => 'check data integrity for one or all relations',
> + proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
> + prorows => '20', proretset => 't', proparallel => 'r',
> + provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
> + proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
> + proargmodes => '{i,i,o,o,o,o,o}',
> + proargnames =>
> '{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
> + prosrc => 'pg_check_relation' },
>
> The function argument data types don't match in the doc and function
> declaretion. relation is 'oid' in the doc but is 'regclass' in the
> function declaretion.

Fixed.

> 8.
> +#define SRF_COLS 5 /* Number of output arguments in the SRF */
>
> Looking at similar built-in functions that return set of records they
> use a more specific name for the number of returned columns such as
> PG_STAT_GET_WAL_SENDERS_COLS and PG_GET_SHMEM_SIZES_COLS. How about
> PG_CHECK_RELATION_COLS?
>
> check_relation_fork() seems to quite depends on pg_check_relation()
> because the returned tuplestore is specified by pg_check_relation().
> It's just an idea but to improve reusability, how about moving
> check_relation_fork() to checksumfunc.c? That is, in checksumfuncs.c
> while iterating all blocks we call a new function in checksum.c, say
> check_one_block() function, which has the following part and is
> responsible for getting, checking the specified block and returning a
> boolean indicating whether the block has corruption or not, along with
> chk_found and chk_expected:
>
> /*
> * To avoid too much overhead, the buffer will be first read without
> * the locks that would guarantee the lack of false positive, as such
> * events should be quite rare.
> */
> Retry:
> if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> &found_in_sb))
> continue;
>
> if (check_buffer(buffer, blkno, &chk_expected, &chk_found))
> continue;
>
> /*
> * If we get a failure and the buffer wasn't found in shared buffers,
> * reread the buffer with suitable lock to avoid false positive. See
> * check_get_buffer for more details.
> */
> if (!found_in_sb && !force_lock)
> {
> force_lock = true;
> goto Retry;
> }
>
> A new function in checksumfuncs.c or pg_check_relation will be
> responsible for storing the result to the tuplestore. That way,
> check_one_block() will be useful for other use when we want to check
> if the particular block has corruption with low overhead.

Yes, I agree that passing the tuplestore isn't an ideal approach and some
refactoring should probably happen. One thing is that this wouldn't be
"check_one_block()" but "check_one_block_on_disk()" (which could also be from
the OS cache). I'm not sure how useful it's in itself. It also raises some
concerns about the throttling. I didn't change that for now, but I hope
there'll be some other feedback about it.

Attachment Content-Type Size
v6-0001-Add-a-pg_check_relation-function.patch text/plain 42.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-04-03 09:58:11 Re: User Interface for WAL usage data
Previous Message Amit Langote 2020-04-03 09:34:20 Re: adding partitioned tables to publications