Re: Detecting DB corruption

From: Raj Gandhi <raj01gandhi(at)gmail(dot)com>
To: Gunnar Nick Bluth <gunnar(dot)bluth(at)pro-open(dot)de>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Detecting DB corruption
Date: 2012-11-01 15:10:08
Message-ID: CALU_HCNLua7XvAgVexj+YFy2etFrLT_z1sc-5OeG+KQ3xDuhUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Re-sending to correct addresses.

>>Seriously, if you're facing DB corruption then something is already
>>horribly wrong with your setup.

>"Horribly" is not strong enough a word IMHO when we're discussing double
primary key values... except if Raj is not using sequences to generate
them. Although on the other hand, in that case, it's in turn >an even more
horrible setup; questionable DB design on unreliable hardware.

>Raj, would you mind pasting your schema somewhere, at least of the tables
you experienced the corruption?

Each DB table has primary key that is populated using DB-sequence. There is
a UNIQUE constraint created on natural keys.
The problem on the test setup was because disk cache was enabled. Indexes
were corrupted when powering down the host. I have noticed that integrity
of both PK and UNIQUE constraint were violated - Table had rows with
duplicate primary keys and in other case there were rows with duplicate
unique key constraint.

We are now evaluating to turn off the disk cache to avoid this kind of
corruption.

I would still like to have some mechanism to detect any kind of corruption
in Postgres. As Craig pointed out I am planning to run "reindex database
<dbname>" which will detect any duplicates and will also remove any
corruption as indexes are recreated.

About the corruption in table - will running "VACUUM FULL" on all tables
detect the corruption?
I see 8.4 and later version has param 'vacuum_freeze_table_age' which by
setting to 0 will force regular "vacuum" to run on whole database and will
check every block. I don't see that param in 8.3 though so I guess "vacuum
full" is the only option.

If "vacuum full" is not going to detect the corruption then I am also
thinking to run "pg_dump" which should catch the corruption.

On Thu, Nov 1, 2012 at 4:06 AM, Gunnar "Nick" Bluth <
gunnar(dot)bluth(at)pro-open(dot)de> wrote:

> Am 01.11.2012 06:47, schrieb Craig Ringer:
>
> On 11/01/2012 01:10 PM, Scott Ribe wrote:
>>
>>> On Oct 31, 2012, at 8:50 PM, Craig Ringer wrote:
>>>
>>> Seriously, if you're facing DB corruption then something is already
>>>> horribly wrong with your setup.
>>>>
>>> "Horribly" is not strong enough a word IMHO when we're discussing double
> primary key values... except if Raj is not using sequences to generate
> them. Although on the other hand, in that case, it's in turn an even more
> horrible setup; questionable DB design on unreliable hardware.
>
> Raj, would you mind pasting your schema somewhere, at least of the tables
> you experienced the corruption?
>
> True, but. In a past life, complaints from the db (it was a db that
>>> stored a checksum with every block) were the very first symptom when
>>> something went horribly wrong with the hardware. (Partial short between
>>> wires of an internal SCSI cable; eventually we determined that about every
>>> 1MB, 1 bit would get flipped between the controller & disk.)
>>>
>>> So, if there were an official db verifier tool for PG, I for one would
>>> have it run periodically.
>>>
>> If there were a way to reliably detect corruption, so would I. As things
>> stand there are no block checksums, so if a bit gets flipped in some
>> random `text` field you're never going to know, corruption-checker or
>> no. Some forms of random corruption - like bad blocks on disks causing
>>
> I think checksums are currently being worked on and are to be expected for
> 9.3. Might be interesting to scan -hackers for that once more...
>
> I/O errors, zeroed blocks, truncated files, etc - will become apparent
>> with general checking, but others won't be detectable unless you know
>> what the expected vs actual data is.
>>
>> If page checksumming or any other reliable method of detecting possible
>> incipient corruption were available I'd quite likely want to use it for
>> much the same reason you outlined. For that matter, if there were a
>> general "sanity check my tables and indexes" tool I'd probably use that
>> too. However, no such tool exists - and in a good setup, none should be
>> needed. I'd want to use one anyway purely out of paranoia.
>>
>> --
>> Craig Ringer
>>
>>
>> On a side note, Raj, you might want to read the descriptions of MVCC and
> WAL once more, then re-think about your idea of updating all rows and
> rolling back the transaction. That would potentially produce the effect
> you're looking for with InnoDB or Oracle, but not with PG.
>
> Cheers,
>
> --
> Gunnar "Nick" Bluth
> RHCE/SCLA
>
> Mobil +49 172 8853339
> Email: gunnar(dot)bluth(at)pro-open(dot)de
> ______________________________**______________________________**
> ______________
> In 1984 mainstream users were choosing VMS over UNIX. Ten years later
> they are choosing Windows over UNIX. What part of that message aren't you
> getting? - Tom Payne
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2012-11-01 15:30:14 Re: Public key for wxBase-2.8.12-1.el5.i386.rpm is not installed for pg_admin3
Previous Message Terry Khatri 2012-11-01 14:05:38 Fwd: Errors on pg_dumpall