Re: Losing records when server hang

From: Marco Colombo <marco(at)esi(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: lec <limec(at)streamyx(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Losing records when server hang
Date: 2004-08-09 17:18:05
Message-ID: 4117B1CD.2020500@esi.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> lec <limec(at)streamyx(dot)com> writes:
>
>>It's a SCSI, RAID-5 on a Dell server.
>
>
>>The hardware actually "hang". The Dell engineers came and replaced the
>>motherboard but couldn't tell what the actual fault was.
>
>
>>Commit as in 'COMMIT'. 'Records' 1,2,3,4,5,6,7,8,9,10 are actually
>>transactions. I'm as puzzled as to why I lost the transactions in the
>>middle but got the last transaction.
>
>
> I'm puzzled too. I don't suppose you have the postmaster log from when
> it was trying to recover from the crash? Or even better, copies of the
> WAL files?
>
> A possible theory has to do with corruption of the WAL log. For
> instance, transactions 1-10 are all down to disk in WAL (or at least the
> kernel told postgres the writes were done) and for one reason or another
> the buffer manager chances to flush the page containing record 10 out
> to its data file before the other records' pages. Now the system hangs.
> After reboot, if the WAL log is unreadable beyond transaction 1 then the
> database would come up with transaction 1 replayed, 2-10 not replayed,
> but 10's data is out there anyway.
>
> However this would seem to imply disk drive misfeasance above and beyond
> your motherboard problem.

Well, no. How about this theory:

1) everything is ok:
the backend executes write()/fsync() for transactions 1-5

2) hardware fails some how at MB level (imagine CPU/RAM overheating):
RAM gets corrupted - kernel starts oopsing (but goes on)
meanwhile, the backend executes write()/fsync() for transactions 6-10,
but randomly corrupted data gets written to disk.

3) unrecoverable kernel error occurs, the show stops.

On recover, transactions 6-9 don't even look like valid log entries, while
10, for some reason, does (maybe only data is corrupted).

I'm not familiar with the details of WAL files and post-crash recovery,
but is that possible? Or does the process stop at the first failure?

Anyway, if your CPU/RAM is failing, no DB technology can save you. You
need redundant CPU/RAM units to perform the same operations concurrenly,
and the hardware to validate the results on a 2vs1 basis at least.
Ask NASA, I think they know what "mission critical" actually means. :)

Really, when the hardware starts flipping random bits in RAM, you can't
even know how long it's being going on, can be hours w/o the kernel panic
or hang at all. No one knows how good is your data. There's no point in
recovering a transaction if the data inside is corrupted.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Bartz 2004-08-09 17:25:35 Re: Out of swap space & memory
Previous Message Brigitte ROLLAND 2004-08-09 17:03:03 Problems with MS Visual Basic 6.0