Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Date: 2021-01-07 22:03:59
Message-ID: CAH2-Wzk2+M_=MuUGHJnWxCSfFxNt-3mqt02KTL92qLuqKtyxng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> I expected there'd be some disagreement on this, but I do continue to
> feel that it's sensible to enable checksums by default. I also don't
> think there's anything particularly wrong with such a difference of
> opinion, though it likely means that we're going to continue on with the
> status quo- where, certainly, very many deployments enable it even
> though the upstream default is to have it disabled.

I agree with all that.

> This certainly
> isn't the only place that's done, though we've been working to improve
> that situation with things like trying to get rid of 'trust' being used
> in our default pg_hba.conf.

That seems like an easier case to make to me.

> Short answer is 'yes', as mentioned down-thread and having checksums was
> a pre-requisite to deploying PG in RDS (or so folks very involved in RDS
> have told me previously- and I'll also note that it was 9.3 that was
> first deployed as part of RDS). I don't think there's any question that
> they're using --data-checksums and that it is, in fact, the actual
> original PG checksum code (or at least was at 9.3, though I've further
> heard comments that they actively try to minimize the delta between RDS
> and PG).

I accept that.

> Nope, the risk from not having fsync was clearly understood, and still
> is, to be a larger risk than not having checksums. That doesn't mean
> there's no risk to not having checksums or that we simply shouldn't
> consider checksums to be worthwhile or that we shouldn't have them on by
> default. I outlined them together in that they're both there to address
> the risk that "something doesn't go right", but, as I said previously
> and again above, the level of risk between the two isn't the same. That
> doesn't mean we shouldn't consider that checksums *do* address a risk
> and consider enabling them by default- even with the performance impact
> that they have today.

Fair.

> Much of this line of discussion seems to be, incorrectly, focused on my
> mere mention of viewing the use of fsync and checksums as mechanism for
> addressing certain risks, but that doesn't seem to be a terribly
> fruitful direction to be going in. I'm not suggesting that we should go
> turn off fsync by default simply because we don't have checksums on by
> default, which seems to be the implication.

I admit that I saw red. This was a direct result of your bogus
argument, which greatly overstated the case in favor of enabling
checksums by default. I regret my role in that now, though. It would
be good to debate the actual issue, but that isn't what I saw.
Everyone knows the principles behind checksums and how they're useful
-- it doesn't need to be a part of the discussion.

I think that it should be possible to make a much better case in favor
of enabling checksums by default. On further reflection I actually
don't think that the real-world VACUUM overhead is anything like 15x,
though the details are complex. I might be willing to help with this
analysis, but since you only seem to want to discuss the question in a
narrow way (e.g. "I agree that improving compression performance would
be good but I don't see that as relevant to the question of what our
defaults should be"), I have to wonder if it's worth the trouble.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-01-07 23:07:56 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message Josef Šimánek 2021-01-07 22:00:24 Re: [PATCH] Simple progress reporting for COPY command