Re: Online checksums verification in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-03-16 03:29:28
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 11, 2020 at 08:18:23AM +0100, Julien Rouhaud wrote:
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.

+ * - if a block is not found in shared_buffers, the LWLock is relased and the
+ * block is read from disk without taking any lock. If an error is detected,
+ * the read block will be discarded and retrieved again while holding the
+ * LWLock. This is because an error due to concurrent write is possible but
+ * very unlikely, so it's better to have an optimistic approach to limit
+ * locking overhead
This can be risky with false positives, no? With a large amount of
shared buffer eviction you actually increase the risk of torn page
reads. Instead of a logic relying on partition mapping locks, which
could be unwise on performance grounds, did you consider different
approaches? For example a kind of pre-emptive lock on the page in
storage to prevent any shared buffer operation to happen while the
block is read from storage, that would act like a barrier.

+ * Vacuum's GUCs are used to avoid consuming too much resources while running
+ * this tool.
Shouldn't this involve separate GUCs instead of the VACUUM ones? I
guess that this leads to the fact that this function may be better as
a contrib module, with the addition of some better-suited APIs in core
(see paragraph above).

+ make check
+ make installcheck
Why is installcheck mentioned here?

I don't think that it is appropriate to place the SQL-callable part in
the existing checksum.c. I would suggest instead a new file, say
checksumfuncs.c in src/backend/utils/adt/, holding any SQL functions
for checksums.

-SUBDIRS = perl regress isolation modules authentication recovery
+SUBDIRS = perl regress isolation modules authentication check_relation \
+ recovery subscription
It seems to me that this test would be a good fit for

+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+ ForkNumber forknum)
Per the argument of bloat, I think that I would remove
check_all_relation() as this function could take a very long time to
run, and just make the SQL function strict.

+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ * disk either before the end of the next checkpoint or during recovery in
+ * case of unsafe shutdown
Wouldn't it actually be a good thing to check that the page on storage
is fine in this case? This depends on the system settings and the
checkpoint frequency, but checkpoint_timeout can be extended up to 1
day. And plenty of things could happen to the storage in one day,
including a base backup that includes a corrupted page on storage,
that this function would not be able to detect.

+ * - if a block is otherwise found in shared_buffers, an IO lock is taken on
+ * the block and the block is then read from storage, ignoring the block in
+ * shared_buffers
Yeah, I think that you are right here to check the page on storage

+ * we detect if a block is in shared_buffers or not. See get_buffer()
+ * comments for more details about the locking strategy.
get_buffer() does not exist in your patch, check_get_buffer() does.

+ * - if a block is not found in shared_buffers, the LWLock is relased and the
+ * To avoid torn page and possible false postives when reading data, and

+ if (!DataChecksumsEnabled())
+ elog(ERROR, "Data checksums are not enabled");
Note that elog() is for the class of errors which are never expected,
and here a caller of pg_check_relation() with checksums disabled can
trigger that. So you need to call ereport() with

+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ * disk either before the end of the next checkpoint or during recovery in
+ * case of unsafe shutdown
Not sure that the indentation is going to react well on that part of
the patch, perhaps it would be better to add some "/*-------" at the
beginning and end of the comment block to tell pgindent to ignore this

Based on the feedback gathered on this thread, I guess that you should
have a SRF returning the list of broken blocks, as well as NOTICE
messages. Another thing to consider is the addition of a range
argument to only check a certain portion of the blocks, say one
segment file at a time, etc. Fine by me to not include in the first
flavor of the patch.

The patch needs documentation.

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-03-16 03:46:47 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message Masahiko Sawada 2020-03-16 03:26:55 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager