Re: Online verification of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-07 11:53:30
Message-ID: 29a0ef4d-7d5d-fe5c-253b-3f7f53df0859@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/6/19 6:42 PM, Andres Freund wrote:
>
> ...
>
> To me the right way seems to be to IO lock the page via PG after such a
> failure, and then retry. Which should be relatively easily doable for
> the basebackup case, but obviously harder for the pg_verify_checksums
> case.
>

Actually, what do you mean by "IO lock the page"? Just waiting for the
current IO to complete (essentially BM_IO_IN_PROGRESS)? Or essentially
acquiring a lock and holding it for the duration of the check?

The former does not really help, because there might be another I/O
request initiated right after, interfering with the retry.

The latter might work, assuming the check is fast (which it probably
is). I wonder if this might cause issues due to loading possibly
corrupted data (with invalid checksums) into shared buffers. But then
again, we could just hack a special version of ReadBuffer_common() which
would just

(a) check if a page is in shared buffers, and if it is then consider the
checksum correct (because in memory it may be stale, and it was read
successfully so it was OK at that moment)

(b) if it's not in shared buffers already, try reading it and verify the
checksum, and then just evict it right away (not to spoil sb)

Or did you have something else in mind?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2019-03-07 12:09:49 Re: Ltree syntax improvement
Previous Message Amit Langote 2019-03-07 11:36:02 Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table