Re: Enable data checksums by default

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2019-03-22 16:32:10
Message-ID: 20ce4ea9-47f3-9903-01b5-00eaf349b09d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/22/19 5:10 PM, Andres Freund wrote:
> Hi,
>
> On 2019-03-22 12:07:22 -0400, Tom Lane wrote:
>> Christoph Berg <myon(at)debian(dot)org> writes:
>>> I think, the next step in that direction would be to enable data
>>> checksums by default. They make sense in most setups,
>>
>> Well, that is exactly the point that needs some proof, not just
>> an unfounded assertion.
>>
>> IMO, the main value of checksums is that they allow the Postgres
>> project to deflect blame. That's nice for us but I'm not sure
>> that it's a benefit for users. I've seen little if any data to
>> suggest that checksums actually catch enough problems to justify
>> the extra CPU costs and the risk of false positives.
>

I'm not sure about checksums being an effective tool to deflect blame.
Considering the recent fsync retry issues - due to the assumption that
we can just retry fsync we might have lost some of the writes, resulting
in torn pages and checksum failures. I'm sure we could argue about how
much sense the fsync behavior makes, but I doubt checksum failures are
enough to deflect blame here.

> IDK, being able to verify in some form that backups aren't corrupted on
> an IO level is mighty nice. That often does allow to detect the issue
> while one still has older backups around.
>

Yeah, I agree that's a valuable capability. I think the question is how
effective it actually is considering how much the storage changed over
the past few years (which necessarily affects the type of failures
people have to deal with).

> My problem is more that I'm not confident the checks are mature
> enough. The basebackup checks are atm not able to detect random data,
> and neither basebackup nor backend checks detect zeroed out files/file
> ranges.
>

Yep :-( The pg_basebackup vulnerability to random garbage in a page
header is unfortunate, we better improve that.

It's not clear to me what can checksums do about zeroed pages (and/or
truncated files) though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-22 16:38:04 Re: propagating replica identity to partitions
Previous Message Tom Lane 2019-03-22 16:20:41 Re: Ordered Partitioned Table Scans