Re: Online enabling of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Online enabling of checksums
Date: 2018-09-30 08:48:36
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On 09/29/2018 06:51 PM, Stephen Frost wrote:
> Greetings,
> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>> On 09/29/2018 02:19 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>>>> While looking at the online checksum verification patch (which I guess
>>>> will get committed before this one), it occurred to me that disabling
>>>> checksums may need to be more elaborate, to protect against someone
>>>> using the stale flag value (instead of simply switching to "off"
>>>> assuming that's fine).
>>>> The signals etc. seem good enough for our internal stuff, but what if
>>>> someone uses the flag in a different way? E.g. the online checksum
>>>> verification runs as an independent process (i.e. not a backend) and
>>>> reads the control file to find out if the checksums are enabled or not.
>>>> So if we just switch from "on" to "off" that will break.
>>>> Of course, we may also say "Don't disable checksums while online
>>>> verification is running!" but that's not ideal.
>>> I'm not really sure what else we could say here..? I don't particularly
>>> see an issue with telling people that if they disable checksums while
>>> they're running a tool that's checking the checksums that they're going
>>> to get odd results.
>> I don't know, to be honest. I was merely looking at the online
>> verification patch and realized that if someone disables checksums it
>> won't notice it (because it only reads the flag once, at the very
>> beginning) and will likely produce bogus errors.
>> Although, maybe it won't - it now uses a checkpoint LSN, so that might
>> fix it. The checkpoint LSN is read from the same controlfile as the
>> flag, so we know the checksums were enabled during that checkpoint. Soi
>> if we ignore failures with a newer LSN, that should do the trick, no?
>> So perhaps that's the right "protocol" to handle this?
> I certainly don't think we need to do anything more.

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matteo Beccati 2018-09-30 08:49:21 Re: [HACKERS] kqueue
Previous Message Fabien COELHO 2018-09-30 07:23:50 Re: libpq host/hostaddr/conninfo inconsistencies