Re: Online checksums verification in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-03-18 06:06:19
Message-ID: 20200318060610.GA58497@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 18, 2020 at 01:20:47PM +0900, Michael Paquier wrote:
> On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote:
> > On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote:
> >> With a large amount of
> >> shared buffer eviction you actually increase the risk of torn page
> >> reads. Instead of a logic relying on partition mapping locks, which
> >> could be unwise on performance grounds, did you consider different
> >> approaches? For example a kind of pre-emptive lock on the page in
> >> storage to prevent any shared buffer operation to happen while the
> >> block is read from storage, that would act like a barrier.
> >
> > Even with a workload having a large shared_buffers eviction pattern, I don't
> > think that there's a high probability of hitting a torn page. Unless I'm
> > mistaken it can only happen if all those steps happen concurrently to doing the
> > block read just after releasing the LWLock:
> >
> > - postgres read the same block in shared_buffers (including all the locking)
> > - dirties it
> > - writes part of the page
> >
> > It's certainly possible, but it seems so unlikely that the optimistic lock-less
> > approach seems like a very good tradeoff.
> Having false reports in this area could be very confusing for the
> user. That's for example possible now with checksum verification and
> base backups.

I agree, however this shouldn't be the case here, as the block will be
rechecked while holding proper lock the 2nd time in case of possible false
positive before being reported as corrupted. So the only downside is to check
twice a corrupted block that's not found in shared buffers (or concurrently
loaded/modified/half flushed). As the number of corrupted or concurrently
loaded/modified/half flushed blocks should usually be close to zero, it seems
worthwhile to have a lockless check first for performance reason.

> > For the record when I first tested that feature I did try to check dirty
> > blocks, and it seemed that dirty blocks of shared relation were sometimes
> > wrongly reported as corrupted. I didn't try to investigate more though.
> Hmm. It would be good to look at that, correct verification of shared
> relations matter.

I'll try to investigate, but non-dirty shared relation blocks can be checked
and work as intended.

> >> + * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
> >> + * disk either before the end of the next checkpoint or during recovery in
> >> + * case of unsafe shutdown
> >> Not sure that the indentation is going to react well on that part of
> >> the patch, perhaps it would be better to add some "/*-------" at the
> >> beginning and end of the comment block to tell pgindent to ignore this
> >> part?
> >
> > Ok. Although I think only the beginning comment is needed?
> From src/tools/pgindent/README:
> "pgindent will reflow any comment block that's not at the left margin.
> If this messes up manual formatting that ought to be preserved,
> protect the comment block with some dashes:"
> /*----------
> * Text here will not be touched by pgindent.
> *----------
> */

For instance the block comment in gen_partprune_steps_internal() disagrees.
Anyway I added both as all the nearby codes does that for overall function

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-18 06:13:12 Re: Online checksums verification in the backend
Previous Message Kirill Bychik 2020-03-18 06:02:58 Re: WAL usage calculation patch