Re: developer.pgadmin.org/nagios.pgadmin.org - Diskfailure

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Jeff MacDonald" <jam(at)zoidtechnologies(dot)com>
Cc: <pgadmin-hackers(at)postgresql(dot)org>, "PostgreSQL WWW" <pgsql-www(at)postgresql(dot)org>
Subject: Re: developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
Date: 2006-05-12 22:37:21
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E401388168@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgsql-www

> -----Original Message-----
> From: Jeff MacDonald [mailto:jam(at)zoidtechnologies(dot)com]
> Sent: 12 May 2006 23:19
> To: Dave Page
> Cc: Jeff MacDonald
> Subject: Re: [pgsql-www]
> developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
>
> On Fri, 2006-05-12 at 22:47 +0100, Dave Page wrote:
> > The machine hosting the developer.pgadmin.org and
> nagios.pgadmin.org
> > vservers is currently having serious filesystem problems, which are
> > causing disk intensive operations (like rsync, tar) to segfault for
> > currently unknown reasons.
>
> do a memory test, swap as needed, see if that solves the
> problem..

I'll try just replacing it - I have some unopened sticks for that mobo.
FWIW, a reboot with a forced fsck found no errors at all and the box is
currently working OK, but I have now found errors similar to the
following:

May 12 21:11:29 barbas rsyncd[32134]: rsync: writefd_unbuffered failed
to write 4 bytes: phase "send_file_entry" [sender]: Broken pipe (32)
May 12 21:11:29 barbas rsyncd[32134]: rsync error: error in rsync
protocol data stream (code 12) at io.c(1126) [sender]
May 12 22:13:52 barbas kernel: kernel BUG at page_alloc.c:142!
May 12 22:13:52 barbas kernel: invalid operand: 0000
May 12 22:13:52 barbas kernel: CPU: 1
May 12 22:13:52 barbas kernel: EIP: 0010:[<c013cec0>] Not tainted
May 12 22:13:52 barbas kernel: EFLAGS: 00010286
May 12 22:13:52 barbas kernel: eax: d9e18100 ebx: c262c140 ecx:
c262c140 edx: 00000000
May 12 22:13:52 barbas kernel: esi: c262c140 edi: 00000000 ebp:
00000000 esp: d50d5edc
May 12 22:13:52 barbas kernel: ds: 0018 es: 0018 ss: 0018
May 12 22:13:52 barbas kernel: Process rsync (pid: 32141,
stackpage=d50d5000)
May 12 22:13:52 barbas kernel: Stack: d50d5ee8 c0133ab0 00001000
c262c140 e3a59d44 00006000 c01348e9 00000000
May 12 22:13:52 barbas kernel: 00000000 00001000 c262c140
e3a59d44 00000000 c013423d d50d5f7c c262c140
May 12 22:13:52 barbas kernel: 00000000 00001000 00001000
00000001 00000000 0000013b e3a59c80 c01347f0
May 12 22:13:52 barbas kernel: Call Trace: [<c0133ab0>] [<c01348e9>]
[<c013423d>] [<c01347f0>] [<c01347f0>]
May 12 22:13:52 barbas kernel: [<c0134a2f>] [<c01347f0>] [<c0145a50>]
[<c0108fdf>]
May 12 22:13:52 barbas kernel:
May 12 22:13:52 barbas kernel: Code: 0f 0b 8e 00 6b ba 37 c0 e9 ba fd ff
ff 8b 69 60 85 ed 0f 85

Could well be a duff stick I guess, given where it died.

> the quicker solution may be to just put the backup
> machine into production rather than running exhaustive memory tests.

Yes, well it was going into it anyway to get it out of the current 3U
chassis and into a 1U one with full OOB management. The only problem is
that I'm still awaiting delivery of a cable for the external tape drive
in the rack so I can only do rsync/scp backups until that arrives.

Regards, Dave.

Responses

Browse pgadmin-hackers by date

  From Date Subject
Next Message Travis Hein 2006-05-13 02:04:54 Re: developer.pgadmin.org/nagios.pgadmin.org - Disk failure
Previous Message Dave Page 2006-05-12 21:47:52 developer.pgadmin.org/nagios.pgadmin.org - Disk failure

Browse pgsql-www by date

  From Date Subject
Next Message Travis Hein 2006-05-13 02:04:54 Re: developer.pgadmin.org/nagios.pgadmin.org - Disk failure
Previous Message Dave Page 2006-05-12 21:47:52 developer.pgadmin.org/nagios.pgadmin.org - Disk failure