|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Stephen Frost <sfrost(at)snowman(dot)net>|
|Cc:||Daniel Gustafsson <daniel(at)yesql(dot)se>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>|
|Subject:||Re: Online enabling of checksums|
|Views:||Raw Message | Whole Thread | Download mbox|
On 09/29/2018 06:51 PM, Stephen Frost wrote:
> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>> On 09/29/2018 02:19 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>>>> While looking at the online checksum verification patch (which I guess
>>>> will get committed before this one), it occurred to me that disabling
>>>> checksums may need to be more elaborate, to protect against someone
>>>> using the stale flag value (instead of simply switching to "off"
>>>> assuming that's fine).
>>>> The signals etc. seem good enough for our internal stuff, but what if
>>>> someone uses the flag in a different way? E.g. the online checksum
>>>> verification runs as an independent process (i.e. not a backend) and
>>>> reads the control file to find out if the checksums are enabled or not.
>>>> So if we just switch from "on" to "off" that will break.
>>>> Of course, we may also say "Don't disable checksums while online
>>>> verification is running!" but that's not ideal.
>>> I'm not really sure what else we could say here..? I don't particularly
>>> see an issue with telling people that if they disable checksums while
>>> they're running a tool that's checking the checksums that they're going
>>> to get odd results.
>> I don't know, to be honest. I was merely looking at the online
>> verification patch and realized that if someone disables checksums it
>> won't notice it (because it only reads the flag once, at the very
>> beginning) and will likely produce bogus errors.
>> Although, maybe it won't - it now uses a checkpoint LSN, so that might
>> fix it. The checkpoint LSN is read from the same controlfile as the
>> flag, so we know the checksums were enabled during that checkpoint. Soi
>> if we ignore failures with a newer LSN, that should do the trick, no?
>> So perhaps that's the right "protocol" to handle this?
> I certainly don't think we need to do anything more.
Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.
FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.
But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Matteo Beccati||2018-09-30 08:49:21||Re: [HACKERS] kqueue|
|Previous Message||Fabien COELHO||2018-09-30 07:23:50||Re: libpq host/hostaddr/conninfo inconsistencies|