Re: Online checksums verification in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-09-07 07:38:30
Message-ID: CAOBaU_bJEYemqwzQ4p8+txuT64AXUg2L0p8qHgs5vNGuZ0_kVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 7, 2020 at 8:59 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> +#include "postgres.h"
> +
> +#include "access/tupdesc.h"
> +#include "common/relpath.h"
> #include "storage/block.h"
> +#include "utils/relcache.h"
> +#include "utils/tuplestore.h"
> [...]
> +extern bool check_one_block(Relation relation, ForkNumber forknum,
> + BlockNumber blkno, uint16 *chk_expected,
> + uint16 *chk_found);
> I don't think that it is a good idea to add this much to checksum.h
> as these are includes coming mainly from the backend. Note that
> pg_checksum_page() is a function designed to be also available for
> frontend tools, with checksum.h something that can be included in
> frontends. This would mean that we could move all the buffer lookup
> APIs directly to checksumfuncs.c, or move that into a separate file
> closer to the location.

Did you mean creating a new checksumfuncs.c file? I don't find any
such file in the current tree.

> + * A zero checksum can never be computed, see pg_checksum_page() */
> +#define NoComputedChecksum 0
> Wouldn't it be better to rename that something like
> InvalidPageChecksum, and make use of it in pg_checksum_page()? It
> would be more consistent with the naming of transaction IDs, OIDs or
> even XLogRecPtr. And that could be a separate patch.

It seems quite ambiguous, as checksum validity usually has a different
meaning. And in the code added here, the meaning isn't that the
ckecksum is invalid but that there's no checsum as it cannot be
computed due to PageIsNew().

> +++ b/src/test/check_relation/t/01_checksums_check.pl
> @@ -0,0 +1,276 @@
> +use strict;
> +use warnings;
> It could be better to move that to src/test/modules/, so as it could
> be picked more easily by MSVC scripts in the future. Note that if we
> apply the normal naming convention here this should be named
> 001_checksum_check.pl.
>
> +subdir = src/test/check_relation
> +top_builddir = ../../..
> +include $(top_builddir)/src/Makefile.global
> Let's use a Makefile shaped in a way similar to modules/test_misc that
> makes use of TAP_TESTS = 1. There is the infra, let's rely on it for
> the regression tests.

Will fix.

> + pg_usleep(msec * 1000L);
> Could it be possible to add a wait event here? It would be nice to be
> able to monitor that in pg_stat_activity.

Sure, I missed that as this was first implemented as an extension.

> +if (exists $ENV{MY_PG_REGRESS})
> +{
> + $ENV{PG_REGRESS} = $ENV{MY_PG_REGRESS};
> +}
> What is MY_PG_REGRESS for? A remnant from an external makefile
> perhaps?

Indeed.

> + /*
> + * If we get a failure and the buffer wasn't found in shared buffers,
> + * reread the buffer with suitable lock to avoid false positive. See
> + * check_get_buffer for more details.
> + */
> + if (!found_in_sb && !force_lock)
> + {
> + force_lock = true;
> + goto Retry;
> + }
> As designed, we have a risk of false positives with a torn page in the
> first loop when trying to look for a given buffer as we would try to
> use smgrread() without a partition lock. This stresses me a bit, and
> false positives could scare users easily. Could we consider first a
> safer approach where we don't do that, and just read the page while
> holding the partition lock? OK, that would be more expensive, but at
> least that's safe in any case. My memory of this patch is a bit
> fuzzy, but this is itching me and this is the heart of the problem
> dealt with here :)

I'm not sure I understand. Unless I missed something this approach
*cannot* raise a false positive. What it does is force a 2nd check
with stronger lock *to make sure it's actually a corruption*, so we
don't raise false positive. The only report that can happen in this
1st loop is if smgread raises an error, which AFAICT can only happen
(at least with mdread) if the whole block couldn't be read, which is a
sign of a very bad problem. This should clearly be reported, as this
cannot be caused by the locking heuristics used here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-07 07:40:57 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Previous Message Dilip Kumar 2020-09-07 07:27:34 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions