On Fri, Oct 28, 2005 at 02:26:31PM +1000, Gavin Sherry wrote:
> Have spoken with Jim on IRC, he says that there have been several crashes
> recently due to a faulty disk array. I guess the zeroing could be an
> outcome of the faulty disk. I wonder if the crash the faulty disk resulted
> in could have been caused some where around mdextend() where we create a
> zero'd page but before we could have written out the initialised page.
Just to clarify, there's no evidence that the array is faulty. I do know
that they were using write-back with a non-battery-backed cache though.
What has been happening is periodic random crashes, around 1 a week. I
now have a good core for one, as well as an assert:
TRAP: FailedAssertion("!(shared->page_number[slotno] == pageno &&
shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)", File:
"slru.c", Line: 308)
I haven't looked at that code yet, so I have no idea what that actually
means. Let me know what info y'all would like to see out of the core.
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2005-10-28 17:32:52|
|Subject: Re: [GENERAL] aix build question re: duplicate symbol warning |
|Previous:||From: Alvaro Herrera||Date: 2005-10-28 16:52:25|
|Subject: Re: ERROR: invalid memory alloc request size <a_big_number_here>|