Re: Online checksums verification in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-09-09 09:25:24
Message-ID: 20200909092524.GC57691@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 07, 2020 at 05:50:38PM +0900, Michael Paquier wrote:
> On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> > Did you mean creating a new checksumfuncs.c file? I don't find any
> > such file in the current tree.
> Your patch adds checksumfuncs.c, so the subroutines grabbing a given
> block could just be moved there.

Sorry, I was in the middle of a rebase for another patch and missed the new
files added in this one. I added a new checksumfuncs.h for the required
include that should not be seen by client code. I kept checksumfuncs.c and
checksums.c so that the SQL visible declaration are separated from the rest of
the implementation as this is what we already do elsewhere I think. If that's
a problem I'll change and put everything in checksumfuncs.[ch].

I also moved the tap tests in src/test/modules and renamed the file with a 3
digits. For the record I initially copied src/test/modules/brin, and this is
apparently the only subdir that has a 2 digits pattern.

I also added a new WAIT_EVENT_CHECK_DELAY wait event.

> > I'm not sure I understand. Unless I missed something this approach
> > *cannot* raise a false positive. What it does is force a 2nd check
> > with stronger lock *to make sure it's actually a corruption*, so we
> > don't raise false positive. The only report that can happen in this
> > 1st loop is if smgread raises an error, which AFAICT can only happen
> > (at least with mdread) if the whole block couldn't be read, which is a
> > sign of a very bad problem. This should clearly be reported, as this
> > cannot be caused by the locking heuristics used here.
> We don't know how much this optimization matters though? Could it be
> possible to get an idea of that? For example, take the case of one
> relation with a fixed size in a read-only workload and a read-write
> workload (as long as autovacuum and updates make the number of
> relation blocks rather constant for the read-write case), doing a
> checksum verification in parallel of multiple clients working on the
> relation concurrently. Assuming that the relation is fully in the OS
> cache, we could get an idea of the impact with multiple
> (shared_buffers / relation size) rates to make the eviction more
> aggressive? The buffer partition locks, knowing that
> NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
> seems to me that it would be good to see if we have a difference.
> What do you think?

I assumed that the odds of having to check the buffer twice were so low, and
avoiding to keep a bufmapping lock while doing some IO was an uncontroversial
enough optimisation, but maybe that's only wishful thinking.

I'll do some becnhmarking and see if I can get some figures, but it'll probably
take some time. In the meantime I'm attaching v11 of the patch that should
address all other comments.

Attachment Content-Type Size
v11-0001-Add-a-pg_check_relation-function.patch text/plain 44.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey V. Lepikhov 2020-09-09 09:38:20 Re: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message Amit Kapila 2020-09-09 08:56:00 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions