Re: Online enabling of checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Online enabling of checksums
Date: 2018-03-02 13:35:32
Message-ID: CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu=aDpGBT8Sk0RQE_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> > Also if that wasn't clear -- we only do the full page write if there
> isn't
> > already a checksum on the page and that checksum is correct.
>
> Hmm.
>
> Suppose that on the master there is a checksum on the page and that
> checksum is correct, but on the standby the page contents differ in
> some way that we don't always WAL-log, like as to hint bits, and there
> the checksum is incorrect. Then you'll enable checksums when the
> standby still has some pages without valid checksums, and disaster
> will ensue.
>
> I think this could be hard to prevent if checksums are turned on and
> off multiple times.
>
>
Do we ever make hintbit changes on the standby for example? If so, it would
definitely cause problems. I didn't realize we did, actually...

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page
write.
* Worker completes and flips checksums on which replicates. At this point,
if the replica reads the page, boom.

I guess we have to remove that optimisation. It's definitely a bummer, but
I don't think it's an absolute dealbreaker.

We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt that's worth
it?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2018-03-02 13:36:53 Re: zheap: a new storage format for PostgreSQL
Previous Message Magnus Hagander 2018-03-02 13:28:54 Re: Online enabling of checksums