Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org - Diskfailure

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: <blacknoz(at)club-internet(dot)fr>, <dpage(at)vale-housing(dot)co(dot)uk>
Cc: <jam(at)zoidtechnologies(dot)com>, <pgadmin-hackers(at)postgresql(dot)org>, <pgsql-www(at)postgresql(dot)org>
Subject: Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
Date: 2006-05-13 19:29:19
Message-ID: 001501c676c3$891ddd89$6a01a8c0@valehousing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgsql-www


-----Original Message-----
From: "Raphaël Enrici"<blacknoz(at)club-internet(dot)fr>
Sent: 13/05/06 12:45:59
To: "Dave Page"<dpage(at)vale-housing(dot)co(dot)uk>
Cc: "Jeff MacDonald"<jam(at)zoidtechnologies(dot)com>, "pgadmin-hackers(at)postgresql(dot)org"<pgadmin-hackers(at)postgresql(dot)org>, "PostgreSQL WWW"<pgsql-www(at)postgresql(dot)org>
Subject: Re: [pgadmin-hackers] [pgsql-www] developer.pgadmin.org/nagios.pgadmin.org - Diskfailure

Hi Raph,

>I recently (2 months ago) experienced kernel crash with reiserfs after
> some electrical failure. I solved the problem by doing a full fsck (I
> mean fsck and then a reiserfs rebuild of the tree [dangerous]). It
> worked, at least for me.

Thanks - I'm leaning towards the memory issue atm as it seems to be OK again following a reboot, and the svn repo which previously wouldn't tar or rsync now verifys perfectly and can be tarred up.

I'll swap the sticks on Monday, and if that doesn't work, then consider a 'full fsck'. If that fails, I guess I'll just move it into the new chassis, and use scp backup to another box until the new scsi cable arrives.

Cheers, Dave

-----Unmodified Original Message-----
Dave Page wrote:
>
>
>
>>-----Original Message-----
>>From: Jeff MacDonald [mailto:jam(at)zoidtechnologies(dot)com]
>>Sent: 12 May 2006 23:19
>>To: Dave Page
>>Cc: Jeff MacDonald
>>Subject: Re: [pgsql-www]
>>developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
>>
>>On Fri, 2006-05-12 at 22:47 +0100, Dave Page wrote:
>>
>>>The machine hosting the developer.pgadmin.org and
>>
>>nagios.pgadmin.org
>>
>>>vservers is currently having serious filesystem problems, which are
>>>causing disk intensive operations (like rsync, tar) to segfault for
>>>currently unknown reasons.
>>
>>do a memory test, swap as needed, see if that solves the
>>problem..
>
>
> I'll try just replacing it - I have some unopened sticks for that mobo.
> FWIW, a reboot with a forced fsck found no errors at all and the box is
> currently working OK, but I have now found errors similar to the
> following:
>
> May 12 21:11:29 barbas rsyncd[32134]: rsync: writefd_unbuffered failed
> to write 4 bytes: phase "send_file_entry" [sender]: Broken pipe (32)
> May 12 21:11:29 barbas rsyncd[32134]: rsync error: error in rsync
> protocol data stream (code 12) at io.c(1126) [sender]
> May 12 22:13:52 barbas kernel: kernel BUG at page_alloc.c:142!
> May 12 22:13:52 barbas kernel: invalid operand: 0000
> May 12 22:13:52 barbas kernel: CPU: 1
> May 12 22:13:52 barbas kernel: EIP: 0010:[<c013cec0>] Not tainted
> May 12 22:13:52 barbas kernel: EFLAGS: 00010286
> May 12 22:13:52 barbas kernel: eax: d9e18100 ebx: c262c140 ecx:
> c262c140 edx: 00000000
> May 12 22:13:52 barbas kernel: esi: c262c140 edi: 00000000 ebp:
> 00000000 esp: d50d5edc
> May 12 22:13:52 barbas kernel: ds: 0018 es: 0018 ss: 0018
> May 12 22:13:52 barbas kernel: Process rsync (pid: 32141,
> stackpage=d50d5000)
> May 12 22:13:52 barbas kernel: Stack: d50d5ee8 c0133ab0 00001000
> c262c140 e3a59d44 00006000 c01348e9 00000000
> May 12 22:13:52 barbas kernel: 00000000 00001000 c262c140
> e3a59d44 00000000 c013423d d50d5f7c c262c140
> May 12 22:13:52 barbas kernel: 00000000 00001000 00001000
> 00000001 00000000 0000013b e3a59c80 c01347f0
> May 12 22:13:52 barbas kernel: Call Trace: [<c0133ab0>] [<c01348e9>]
> [<c013423d>] [<c01347f0>] [<c01347f0>]
> May 12 22:13:52 barbas kernel: [<c0134a2f>] [<c01347f0>] [<c0145a50>]
> [<c0108fdf>]
> May 12 22:13:52 barbas kernel:
> May 12 22:13:52 barbas kernel: Code: 0f 0b 8e 00 6b ba 37 c0 e9 ba fd ff
> ff 8b 69 60 85 ed 0f 85

Dave,

I recently (2 months ago) experienced kernel crash with reiserfs after
some electrical failure. I solved the problem by doing a full fsck (I
mean fsck and then a reiserfs rebuild of the tree [dangerous]). It
worked, at least for me.

Regards,
Raphaël

Browse pgadmin-hackers by date

  From Date Subject
Next Message Raphaël Enrici 2006-05-14 09:56:17 Re: Bug#364787: pgadmin3: pressing delete key on
Previous Message Raphaël Enrici 2006-05-13 11:45:54 Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org

Browse pgsql-www by date

  From Date Subject
Next Message Dave Page 2006-05-13 19:29:21 Re: developer.pgadmin.org/nagios.pgadmin.org - Disk failure
Previous Message Raphaël Enrici 2006-05-13 11:45:54 Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org