Skip site navigation (1) Skip section navigation (2)

Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Enabling Checksums
Date: 2013-01-16 00:36:46
Message-ID: 50F5F61E.2000806@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
First rev of a simple corruption program is attached, in very C-ish 
Python.  The parameters I settled on are to accept a relation name, byte 
offset, byte value, and what sort of operation to do:  overwrite, AND, 
OR, XOR.  I like XOR here because you can fix it just by running the 
program again.  Rewriting this in C would not be terribly difficult, and 
most of the time spent on this version was figuring out what to do.

This follows Jeff's idea that the most subtle corruption is the hardest 
to spot, so testing should aim at the smallest unit of change.  If you 
can spot a one bit error in an unused byte of a page, presumably that 
will catch large errors like a byte swap.  I find some grim amusement 
that the checksum performance testing I've been trying to do got stuck 
behind a problem with a tiny, hard to detect single bit of corruption.

Here's pgbench_accounts being corrupted, the next to last byte on this line:

$ pgbench -i -s 1
$ ./pg_corrupt pgbench_accounts show
Reading byte 0 within file /usr/local/var/postgres/base/16384/25242
Current byte= 0 / $00
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
0000000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 80 01
...
$ ./pg_corrupt pgbench_accounts 14 1
/usr/local/var/postgres base/16384/25242 8192 13434880 1640
Reading byte 14 within file /usr/local/var/postgres/base/16384/25242
Current byte= 128 / $80
Modified byte= 129 / $81
File modified successfully
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
0000000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 81 01

That doesn't impact selecting all of the rows:

$ psql -c "select count(*) from pgbench_accounts"
  count
--------
  100000

And pg_dump works fine against the table too.  Tweaking this byte looks 
like a reasonable first test case for seeing if checksums can catch an 
error that query execution doesn't.

Next I'm going to test the functional part of the latest checksum patch; 
duplicate Jeff's targeted performance tests; and then run some of my 
own.  I wanted to get this little tool circulating now that it's useful 
first.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment: pg_corrupt
Description: text/plain (4.3 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2013-01-16 00:56:52
Subject: Re: Curious buildfarm failures (fwd)
Previous:From: Michael PaquierDate: 2013-01-16 00:11:20
Subject: Re: Parallel query execution

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group