Quick Links

Re: Online enabling of checksums

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject:	Re: Online enabling of checksums
Date:	2018-03-03 16:08:19
Message-ID:	CABUevEwQYBmFLQ5VYtDhWW37nH5WG85WA5oHMvMD0HtHtNzbSg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On 03/03/2018 01:38 PM, Robert Haas wrote:
> > On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> >> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
> >> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>> Hmmm, OK. So we need to have a valid checksum on a page, disable
> >>> checksums, set some hint bits on the page (which won't be
> >>> WAL-logged), enable checksums again and still get a valid
> >>> checksum even with the new hint bits? That's possible, albeit
> >>> unlikely.
> >>
> >> No, the problem is if - as is much more likely - the checksum is
> >> not still valid.
> >
> > Hmm, on second thought ... maybe I didn't think this through
> > carefully enough. If the checksum matches on the master by chance,
> > and the page is the same on the standby, then we're fine, right? It's
> > a weird accident, but nothing is actually broken. The failure
> > scenario is where the standby has a version of the page with a bad
> > checksum, but the master has a good checksum. So for example:
> > checksums disabled, master modifies the page (which is replicated),
> > master sets some hint bits (coincidentally making the checksum
> > match), now we try to turn checksums on and don't re-replicate the
> > page because the checksum already looks correct.
> >
>
> Yeah. Doesn't that pretty much mean we can't skip any pages that have
> correct checksum, because we can't rely on standby having the same page
> data? That is, this block in ProcessSingleRelationFork:
>
> /*
> * If checksum was not set or was invalid, mark the buffer as dirty
> * and force a full page write. If the checksum was already valid, we
> * can leave it since we know that any other process writing the
> * buffer will update the checksum.
> */
> if (checksum != pagehdr->pd_checksum)
> {
> START_CRIT_SECTION();
> MarkBufferDirty(buf);
> log_newpage_buffer(buf, false);
> END_CRIT_SECTION();
> }
>
> That would mean this optimization - only doing the write when the
> checksum does not match - is broken.
>

Yes. I think that was the conclusion of this, as posted in
https://www.postgresql.org/message-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu%3DaDpGBT8Sk0RQE_g%40mail.gmail.com
:)

If that's the case, it probably makes restarts/resume more expensive,
> because this optimization was why after restart the already processed
> data was only read (and the checksums verified) but not written.
>
>
Yes, it definitely does. It's not a dealbreaker, but it's certainly a bit
painful not to be able to resume as cheap.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Re: Online enabling of checksums at 2018-03-03 16:06:06 from Tomas Vondra

Responses

Re: Online enabling of checksums at 2018-03-03 16:17:31 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2018-03-03 16:17:31	Re: Online enabling of checksums
Previous Message	Tomas Vondra	2018-03-03 16:06:06	Re: Online enabling of checksums