Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?

From: "Jacky Leng" <lengjianquan(at)163(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?
Date: 2009-06-16 02:12:12
Message-ID: h16v1p$2q8o$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> I think the reasoning is that if those functions reported a PANIC the
>> chance you could recover your data is zero, because you need the
>> database system to read the other (good) data.

I do not see why PANIC reduced the chance to recover my data. AFAICS,
my data has already corrupted(because of the bad-block here), whether
PANIC or not, the read opertion on the bad-block should get the same result.

> Also, in the case you're complaining about, the problem was that there
> wasn't any O/S error report that we could have PANIC'd about anyhow.

No, the O/S did report the error, which lead to the 453 ERROR messages of
postgres. The O/S error messages(got this using dmesg) is like this:
end_request: I/O error, dev sda, sector 504342711
ata1: EH complete
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: (irq_stat 0x40000008)
ata1.00: cmd 60/08:00:b0:a8:0f/00:00:1e:00:00/40 tag 0 cdb 0x0 data 4096
in
res 41/40:08:b7:a8:0f/06:00:1e:00:00/00 Emask 0x9 (media error)
ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168

> We already do refuse
> to read a page into shared buffers if there's a read error on it,
> so it's not clear to me how you think that an ERROR leaves things
> in an unstable state.
>

In my scene, it seems that the O/S does not ensure that if an I/O operation
(read, write, sync, etc) on a block failed, then all later I/O operations
on this block will also failed. For example:
1. As I noted before, although the bad db-block in my data has been read
unsuccessfully for 453 times, but the 454th read operation succeeds(but
some data(the bad sector) has been set to all-zero). So, even if the 453
failed I/O has reported ERROR, there is still chance that the bad
db-block
can be read in shared buffres.
2. Besides, I have noticed a scene like this: 1)an mdsync operations failed
with the message "ERROR: could not fsync segment XXX of relation XXX:
??";

The error message of O/S(I get this using dmesg command) is like this:
Buffer I/O error on device ^A&#63733;XX205503, logical block 43837786
lost page write due to I/O error on ^A&#63733;XX205503

2) This leaves a half-writen db-block in my data. But the page can still
be read in shared buffers successfully later, which leads to an curious
scene that says "ERROR: could not access status of transaction XXXXX"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2009-06-16 02:43:44 Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Previous Message Jeremy Kerr 2009-06-16 01:51:16 Re: [PATCH] backend: compare word-at-a-time in bcTruelen