Re: Online enabling of checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Online enabling of checksums
Date: 2019-01-31 10:57:09
Message-ID: 20190131105709.yhpwqmmhe2og6kws@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:
>
>
> On 09/29/2018 06:51 PM, Stephen Frost wrote:
> > Greetings,
> >
> > * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> >> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> >>> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> >>>> While looking at the online checksum verification patch (which I guess
> >>>> will get committed before this one), it occurred to me that disabling
> >>>> checksums may need to be more elaborate, to protect against someone
> >>>> using the stale flag value (instead of simply switching to "off"
> >>>> assuming that's fine).
> >>>>
> >>>> The signals etc. seem good enough for our internal stuff, but what if
> >>>> someone uses the flag in a different way? E.g. the online checksum
> >>>> verification runs as an independent process (i.e. not a backend) and
> >>>> reads the control file to find out if the checksums are enabled or not.
> >>>> So if we just switch from "on" to "off" that will break.
> >>>>
> >>>> Of course, we may also say "Don't disable checksums while online
> >>>> verification is running!" but that's not ideal.
> >>>
> >>> I'm not really sure what else we could say here..? I don't particularly
> >>> see an issue with telling people that if they disable checksums while
> >>> they're running a tool that's checking the checksums that they're going
> >>> to get odd results.
> >>
> >> I don't know, to be honest. I was merely looking at the online
> >> verification patch and realized that if someone disables checksums it
> >> won't notice it (because it only reads the flag once, at the very
> >> beginning) and will likely produce bogus errors.
> >>
> >> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> >> fix it. The checkpoint LSN is read from the same controlfile as the
> >> flag, so we know the checksums were enabled during that checkpoint. Soi
> >> if we ignore failures with a newer LSN, that should do the trick, no?
> >>
> >> So perhaps that's the right "protocol" to handle this?
> >
> > I certainly don't think we need to do anything more.
> >
>
> Not sure I agree. I'm not suggesting we absolutely have to write huge
> amount of code to deal with this issue, but I hope we agree we need to
> at least understand the issue so that we can put warnings into docs.
>
> FWIW pg_basebackup (in the default "verify checksums") has this issue
> too AFAICS, and it seems rather unfriendly to just start reporting
> checksum errors during backup in that case.
>
> But as I mentioned, maybe there's no problem at all and using the
> checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-01-31 11:04:39 bug tracking system
Previous Message Petr Jelinek 2019-01-31 10:50:37 Re: Connection slots reserved for replication