Re: Checksums by default?

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums by default?
Date: 2017-01-25 20:22:23
Message-ID: CA+CSw_vc6vMajszzDMpLOHxG=2a16SD+pKt0Us1uqUmsWnOnsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2017 at 8:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Also, it's not as if there are no other ways of checking whether your
> disks are failing. SMART, for example, is supposed to tell you about
> incipient hardware failures before PostgreSQL ever sees a bit flip.
> Surely an average user would love to get a heads-up that their
> hardware is failing even when that hardware is not being used to power
> PostgreSQL, yet many people don't bother to configure SMART (or
> similar proprietary systems provided by individual vendors).

You really can't rely on SMART to tell you about hardware failures. 1
in 4 drives fail completely with 0 SMART indication [1]. And for the 1
in 1000 annual checksum failure rate other indicators except system
restarts only had a weak correlation[2]. And this is without
filesystem and other OS bugs that SMART knows nothing about.

My view may be biased by mostly seeing the cases where things have
already gone wrong, but I recommend support clients to turn checksums
on unless it's known that write IO is going to be an issue. Especially
because I know that if it turns out to be a problem I can go in and
quickly hack together a tool to help them turn it off. I do agree that
to change the PostgreSQL default at least some tool turn it off online
should be included.

[1] https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/
[2] https://www.usenix.org/legacy/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf

Regards,
Ants Aasma

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-25 20:23:21 Re: Checksums by default?
Previous Message Stephen Frost 2017-01-25 20:17:32 Re: pg_ls_dir & friends still have a hard-coded superuser check