Re: IO Timeout

From: Alex Turner <armtuk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org, Alex Turner <aturner(at)neteconomist(dot)com>
Subject: Re: IO Timeout
Date: 2005-03-11 14:47:31
Message-ID: 33c6269f050311064772fac861@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Thanks very much Tom for you input -

The guys at AMCC are suggesting that the firmware on the controller
card crashed, causing the card to basicaly stop IO operations. This
would explain why postgres could not recover and re-read WAL, because
/dev/sdc and sdd were inaccessible at that time.

I think this puzzle is mostly solved - all we need to do now is
figured out what the heck happened on the controller card!

Thanks,

Alex Turner

On Thu, 10 Mar 2005 23:09:07 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Turner <armtuk(at)gmail(dot)com> writes:
> > Well - I am sort of trying to piece together exactly what happened.
> > Here's what I know.
>
> > Around 02:52 I get messages in my syslog stating that there were
> > problems writing to a controler channel:
> > [ various hardware errors snipped ]
>
> > At around 07:30 all connections were failing giving the error:
> > InternalError: FATAL: the database system is in recovery mode
>
> I think what happened here is that Postgres got a write error on WAL,
> which would probably cause a PANIC, and then the ensuing database reboot
> got hung up trying to re-read WAL. Client connection requests would be
> refused with messages like the above until the recovery process
> completed. The fact that this was still going on 4+ hours later shows
> that Postgres is *not* timing out on stuck disk operations ... very much
> the reverse in fact.
>
> You'd be best off to take the matter up with some kernel hackers.
> If there's anything to be done to improve the behavior, it's at
> the kernel device driver level.
>
> regards, tom lane
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Bruno Wolff III 2005-03-11 14:55:54 Re: Grant ALL on schema
Previous Message John DeSoi 2005-03-11 13:42:06 Re: How to enable automatic md5 password prompting when connecting to PostgreSQL

Browse pgsql-general by date

  From Date Subject
Next Message Bruno Wolff III 2005-03-11 14:48:30 Re: row numbering
Previous Message Christopher Browne 2005-03-11 14:40:13 Re: PostgreSQL still for Linux only?