Re: Online enabling of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-04-06 12:34:43
Message-ID: 0b39d372-472e-00bf-69ea-bea428020d8a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/06/2018 11:25 AM, Magnus Hagander wrote:
>
>
> On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres(at)anarazel(dot)de
> <mailto:andres(at)anarazel(dot)de>> wrote:
>
> Hi,
>
> On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> > On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>> wrote:
> > > Is there any sort of locking that guarantees that worker processes see
> > > an up2date value of
> > > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > > there's not. So you can afaict end up with checksums being computed by
> > > the worker, but concurrent writes missing them.  The window is going to
> > > be at most one missed checksum per process (as the unlocking of the page
> > > is a barrier) and is probably not easy to hit, but that's dangerous
> > > enough.
> > >
> >
> > So just to be clear of the case you're worried about. It's basically:
> > Session #1 - sets checksums to inprogress
> > Session #1 - starts dynamic background worker ("launcher")
> > Launcher reads and enumerates pg_database
> > Launcher starts worker in first database
> > Worker processes first block of data in database
> > And at this point, Session #2 has still not seen the "checksums inprogress"
> > flag and continues to write without checksums?
>
> Yes.  I think there are some variations of that, but yes, that's pretty
> much it.
>
>
> > That seems like quite a long time to me -- is that really a problem?
>
> We don't generally build locking models that are only correct based on
> likelihood. Especially not without a lengthy comment explaining that
> analysis.
>
>
> Oh, that's not my intention either -- I just wanted to make sure I
> was thinking about the same issue you were.
>

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

So I'm not sure I see the problem here. But perhaps LockBufHdr is not a
memory barrier?

> Since you know a lot more about that type of interlocks than I do :) We
> already wait for all running transactions to finish before we start
> doing anything. Obviously transactions != buffer writes (and we have
> things like the checkpointer/bgwriter to consider). Is there something
> else that we could safely just *wait* for? I have no problem whatsoever
> if this is a long wait (given the total time). I mean to the point of
> "what if we just stick a sleep(10) in there" level waiting.
>
> Or can that somehow be cleanly solved using some of the new atomic
> operators? Or is that likely to cause the same kind of overhead as
> throwing a barrier in there?
>

Perhaps the easiest thing we could do is walk shared buffers and do
LockBufHdr/UnlockBufHdr, which would guarantee no session is the process
of writing out a buffer with possibly stale checksum flag. Of course,
it's a bit brute-force-ish, but it's not that different from the waits
for running transactions and temporary tables.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2018-04-06 12:43:59 Re: csv format for psql
Previous Message Peter Eisentraut 2018-04-06 12:30:28 Re: [HACKERS] logical decoding of two-phase transactions