Re: Checksums by default?

From: Torsten Zuehlsdorff <mailinglists(at)toco-domains(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums by default?
Date: 2017-01-24 08:05:35
Message-ID: dc9adb10-a026-6850-8ad3-e8d44a3629d4@toco-domains.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.01.2017 19:37, Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>>> Because I see having checksums as, frankly, something we always should
>>> have had (as most other databases do, for good reason...) and because
>>> they will hopefully prevent data loss. I'm willing to give us a fair
>>> bit to minimize the risk of losing data.
>>
>> To be perfectly blunt, that's just magical thinking. Checksums don't
>> prevent data loss in any way, shape, or form. In fact, they can *cause*
>> data loss, or at least make it harder for you to retrieve your data,
>> in the event of bugs causing false-positive checksum failures.
>
> This is not a new argument, at least to me, and I don't agree with it.

I don't agree also. Yes, statistically it is more likely that checksum
causes data-loss. The IO is greater, therefore the disc has more to do
and breaks faster.
But the same is true for RAID: adding more disk increases the odds of an
disk-fallout.

So: yes. If you use checksums at a single disc its more likely to cause
problems. But if you managed it right (like ZFS for example) its an
overall gain.

>> What checksums can do for you, perhaps, is notify you in a reasonably
>> timely fashion if you've already lost data due to storage-subsystem
>> problems. But in a pretty high percentage of cases, that fact would
>> be extremely obvious anyway, because of visible data corruption.
>
> Exactly, and that awareness will allow a user to prevent further data
> loss or corruption. Slow corruption over time is a very much known and
> accepted real-world case that people do experience, as well as bit
> flipping enough for someone to write a not-that-old blog post about
> them:
>
> https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1
>
> A really nice property of checksums on pages is that they also tell you
> what data you *didn't* lose, which can be extremely valuable.

Indeed!

Greetings,
Torsten

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-01-24 08:18:24 Re: Failure in commit_ts tap tests
Previous Message Torsten Zuehlsdorff 2017-01-24 07:59:59 Re: Checksums by default?