Re: Enabling Checksums

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Enabling Checksums
Date: 2013-03-08 08:38:15
Message-ID: 5139A377.1040905@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08.03.2013 05:31, Bruce Momjian wrote:
> Also, don't all modern storage drives have built-in checksums, and
> report problems to the system administrator? Does smartctl help report
> storage corruption?
>
> Let me take a guess at answering this --- we have several layers in a
> database server:
>
> 1 storage
> 2 storage controller
> 3 file system
> 4 RAM
> 5 CPU
>
> My guess is that storage checksums only cover layer 1, while our patch
> covers layers 1-3, and probably not 4-5 because we only compute the
> checksum on write.

There is a thing called "Data Integrity Field" and/or "Data Integrity
Extensions", that allow storing a checksum with each disk sector, and
verifying the checksum in each layer. The basic idea is that instead of
512 byte sectors, the drive is formatted to use 520 byte sectors, with
the extra 8 bytes used for the checksum and some other metadata. That
gets around the problem we have in PostgreSQL, and that filesystems
have, which is that you need to store the checksum somewhere along with
the data.

When a write I/O request is made in the OS, the OS calculates the
checksum and passes it to through the controller to the drive. The drive
verifies the checksum, and aborts the I/O request if it doesn't match.
On a read, the checksum is read from the drive along with the actual
data, passed through the controller, and the OS verifies it. This covers
layers 1-2 or 1-3.

Now, this requires all the components to have support for that. I'm not
an expert on these things, but I'd guess that that's a tall order today.
I don't know which hardware vendors and kernel versions support that.
But things usually keep improving, and hopefully in a few years, you can
easily buy a hardware stack that supports DIF all the way through.

In theory, the OS could also expose the DIF field to the application, so
that you get end-to-end protection from the application to the disk.
This means that the application somehow gets access to those extra bytes
in each sector, and you have to calculate and verify the checksum in the
application. There are no standard APIs for that yet, though.

See https://www.kernel.org/doc/Documentation/block/data-integrity.txt.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-03-08 09:11:49 Re: Enabling Checksums
Previous Message Kyotaro HORIGUCHI 2013-03-08 07:30:45 Re: 9.2.3 crashes during archive recovery