Quick Links

Re: Online checksums verification in the backend

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Online checksums verification in the backend
Date:	2020-09-25 10:11:47
Message-ID:	CAOBaU_Y_A0m5x2QwQ=z6aNf=M-uT1aFFVLeMcHfMDS8z+KrTTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 16, 2020 at 11:46 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Sep 11, 2020 at 09:49:16AM +0200, Julien Rouhaud wrote:
> > Thanks!
>
> I got some numbers out of my pocket, using the following base
> configuration:
> [...]
>
> Within parenthesis is the amount of time taken by pg_relation_check()
> for a single check. This is not completely exact and I saw some
> variations, just to give an impression:
> Conns 64 128 256
> force_lock=true 60590 (7~8s) 55652 (10.2~10.5s) 46812 (9~10s)
> force_lock=false 58637 (5s) 54131 (6~7s) 37091 (1.1~1.2s)
>
> For connections higher than 128, I was kind of surprised to see
> pg_relation_check being more aggressive without the optimization, with
> much less contention on the buffer mapping LWLock actually, but that's
> an impression from looking at pg_stat_activity.
>
> Looking at the wait events for each run, I saw much more hiccups with
> the buffer mapping LWLock when forcing the lock rather than not, still
> I was able to see some contention when also not forcing the lock. Not
> surprising as this rejects a bunch of pages from shared buffers.
>
> > I used all default settings, but by default checksum_cost_delay is 0
> > so there shouldn't be any throttling.
>
> Thanks, so did I. From what I can see, there could be as well
> benefits in not using the optimization by default so as the relation
> check applies some natural throttling by making the checks actually
> slower (there is a link between the individual runtime of
> pg_relation_time and the TPS).

Thanks a lot for the tests! I'm not surprised that forcing the lock
will slow down the pg_check_relation() execution, but I'm a bit
surprised that holding the buffer mapping lock longer in a workload
that has a lot of evictions actually makes things faster. Do you have
any idea why that's the case?

> So it also seems to me that the
> throttling parameters would be beneficial, but it looks to me that
> there is as well a point to not include any throttling in a first
> version if the optimization to go full speed is not around. Using
> three new GUCs for those function calls is still too much IMO

I'm assuming that you prefer to remove both the optimization and the
throttling part? I'll do that with the next version unless there's
objections.

>, so
> there is also the argument to move all this stuff into a new contrib/
> module, and have a bgworker implementation as part of it as it would
> need shared_preload_libraries anyway.
>
> Also, I have been putting some thoughts into the APIs able to fetch a
> buffer without going through the shared buffers. And neither
> checksum.c, because it should be a place that those APIs depends on
> and include only the most-internal logic for checksum algorithm and
> computation, nor checksumfuncs.c, because we need to think about the
> case of a background worker as well (that could spawn a set of dynamic
> workers connecting to different databases able to do checksum
> verifications?). It would be good to keep the handling of the buffer
> mapping lock as well as the calls to smgrread() into a single place.
> ReadBuffer_common() is a natural place for that, though it means for
> our use case the addition of three new options:
> - Being able to pass down directly a buffer pointer to save the page
> contents.
> - Being able to not verify directly a page, leaving the verification
> to the caller upthread.
> - Addition of a new mode, that I am calling here RBM_PRIVATE, where we
> actually read the page from disk if not yet in shared buffers, except
> that we fill in the contents of the page using the pointer given by
> the caller. That's different than the use of local buffers, as we
> don't need this much amount of complications like temporary tables of
> course for per-page checks.
>
> Another idea would be to actually just let ReadBuffer_common just do
> the check by itself, with a different mode like a kind of
> RBM_VALIDATE, where we just return a verification state of the page
> that can be consumed by callers.
>
> This also comes with some more advantages:
> - Tracking of reads from disk with track_io_timing.
> - Addition of some private stats dedicated to this private mode, with
> new fields in pgBufferUsage, all in a single place

I agree that putting the code nearby ReadBuffer_common() would be a
good idea. However, that means that I can't move all the code to
contrib/ I'm wondering what you'd like to see going there. I can see
some values in also having the SQL functions available in core rather
than contrib, e.g. if you need to quickly check a relation on a
standby, so without requiring to create the extension on the primary
node first.

Then, I'm a bit worried about adding this code in ReadBuffer_common.
What this code does is quite different, and I'm afraid that it'll make
ReadBuffer_common more complex than needed, which is maybe not a good
idea for something as critical as this function.

What do you think?

In response to

Re: Online checksums verification in the backend at 2020-09-16 03:45:58 from Michael Paquier

Responses

Re: Online checksums verification in the backend at 2020-10-01 05:07:17 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2020-09-25 10:17:33	Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW
Previous Message	legrand legrand	2020-09-25 10:04:21	Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW