Re: Checksums by default?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums by default?
Date: 2017-01-24 13:22:14
Message-ID: 20170124132214.GL18360@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Ants Aasma (ants(dot)aasma(at)eesti(dot)ee) wrote:
> On Tue, Jan 24, 2017 at 4:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Peter Geoghegan <pg(at)heroku(dot)com> writes:
> >> I thought that checksums went in in part because we thought that there
> >> was some chance that they'd find bugs in Postgres.
> >
> > Not really. AFAICS the only point is to catch storage-system malfeasance.
>
> This matches my understanding. Actual physical media errors are caught
> by lower level checksums/error correction codes, and memory errors are
> caught by ECC RAM.

Not everyone runs with ECC, sadly.

> Checksums do very little for PostgreSQL bugs, which
> leaves only filesystem and storage firmware bugs. However the latter
> are still reasonably common faults.

Agreed, but in additional to filesystem and storage firmware bugs,
virtualization systems can have bugs as well and if those bugs hit the
kernel's cache (which is actually the more likely case- that's what the
VM system is going to think it can monkey with, as long as it works with
the kernel) then you can have cases which PG's checksum would likely
catch since we check the checksum when we read from the kernel's read
cache, and calculate the checksum before we push the page to the
kernel's write cache.

> I have seen multiple cases where,
> after reviewing the corruption with a hex editor, the only reasonable
> conclusion was a bug in the storage system. Data shifted around by
> non-page size amounts, non-page aligned extents that are zeroed out,
> etc.

Right, I've seen similar kinds of things happening in memory of
virtualized systems; things like random chunks of memory suddenly being
zero'd.

> Unfortunately none of those customers had checksums turned on at
> the time. I feel that reacting to such errors with a non-cryptic and
> easily debuggable checksum error is much better than erroring out with
> huge memory allocations, crashing or returning bogus data. Timely
> reaction to data corruption is really important for minimizing data
> loss.

Agreed.

In addition to that, in larger environments where there are multiple
databases involved for the explicit purpose of fail-over, a system which
is going south because of bad memory or storage could be detected and
pulled out, potentially with zero data loss. Of course, to minimize
data loss, it'd be extremely important for the fail-over system to
identify a checksum error more-or-less immediately and take the bad node
out.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-01-24 13:28:21 Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.
Previous Message Tom Lane 2017-01-24 13:19:43 Re: Superowners