Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Enabling Checksums
Date: 2013-01-16 00:36:46
Message-ID: 50F5F61E.2000806@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First rev of a simple corruption program is attached, in very C-ish
Python. The parameters I settled on are to accept a relation name, byte
offset, byte value, and what sort of operation to do: overwrite, AND,
OR, XOR. I like XOR here because you can fix it just by running the
program again. Rewriting this in C would not be terribly difficult, and
most of the time spent on this version was figuring out what to do.

This follows Jeff's idea that the most subtle corruption is the hardest
to spot, so testing should aim at the smallest unit of change. If you
can spot a one bit error in an unused byte of a page, presumably that
will catch large errors like a byte swap. I find some grim amusement
that the checksum performance testing I've been trying to do got stuck
behind a problem with a tiny, hard to detect single bit of corruption.

Here's pgbench_accounts being corrupted, the next to last byte on this line:

$ pgbench -i -s 1
$ ./pg_corrupt pgbench_accounts show
Reading byte 0 within file /usr/local/var/postgres/base/16384/25242
Current byte= 0 / $00
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
0000000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 80 01
...
$ ./pg_corrupt pgbench_accounts 14 1
/usr/local/var/postgres base/16384/25242 8192 13434880 1640
Reading byte 14 within file /usr/local/var/postgres/base/16384/25242
Current byte= 128 / $80
Modified byte= 129 / $81
File modified successfully
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
0000000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 81 01

That doesn't impact selecting all of the rows:

$ psql -c "select count(*) from pgbench_accounts"
count
--------
100000

And pg_dump works fine against the table too. Tweaking this byte looks
like a reasonable first test case for seeing if checksums can catch an
error that query execution doesn't.

Next I'm going to test the functional part of the latest checksum patch;
duplicate Jeff's targeted performance tests; and then run some of my
own. I wanted to get this little tool circulating now that it's useful
first.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
pg_corrupt text/plain 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-16 00:56:52 Re: Curious buildfarm failures (fwd)
Previous Message Michael Paquier 2013-01-16 00:11:20 Re: Parallel query execution