Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Date: 2021-01-06 17:02:40
Message-ID: 20210106170240.GG27507@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> > > * Michael Paquier (michael(at)paquier(dot)xyz) wrote:
> > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:
> > > > > I think enough people use data checksums these days that it warrants to
> > > > > be moved into the "normal part", like in the attached.
> > > >
> > > > +1. Let's see first what others think about this change.
> > >
> > > I agree with this, but I'd also like to propose, again, as has been
> > > discussed a few times, making it the default too.
>
> FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
> logging overhead, the copy we do via PageSetChecksumCopy() shows up
> quite significantly in profiles here. Together with the checksums
> computation that's *halfing* write throughput on fast drives in my aio
> branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

> > This looks much better from the WAL size perspective, there's now almost
> > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > in a real-world example it might still be quite larger. Also, the vacuum
> > runtime is still 15x longer.
>
> That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

> > So maybe we should switch on wal_compression if we enable data checksums
> > by default.

That does seem like a good idea to me, +1 to also changing that.

> It unfortunately also hurts other workloads. If we moved towards a saner
> compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-01-06 17:08:08 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Previous Message Dean Rasheed 2021-01-06 15:11:20 Re: PoC/WIP: Extended statistics on expressions