Re: Further info : Very high load average but no cpu utilization ?

From: "Rajesh Kumar Mallah(dot)" <mallah(at)trade-india(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: Further info : Very high load average but no cpu utilization ?
Date: 2002-05-12 05:46:30
Message-ID: 200205121116.30681.mallah@trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Hi there,

I have observed that it is nearly impossible to
get rid of postmaster or backends by any signal
when it decides not to quit.

Even the OS( Linux rh62) refuses to reboot in such a situation.
and my system admin had to power off the system ,
then fsck .... and stuff.

but this only happens when postmaster is stuck for
some reason , i feel filling up of postmasters log
file was the reason of my postmaster getting stuck.

regds
mallah.

On Saturday 11 May 2002 09:29 pm, Tom Lane wrote:
> "Rajesh Kumar Mallah." <mallah(at)trade-india(dot)com> writes:
> > [root(at)linux10320 root2]# ps auxwww| grep post
> > postgres 1131 0.0 0.0 139424 4 ? D
> > May1004/usr/local/pgsql/bin/postmaster postgres 1132 0.0 0.0 140412
> > 4 ? D May10 0:13 postgres: stats buffer process postgres
> > 1133 0.0 0.0 139576 4 ? S May10 0:18 postgres: stats
> > collector process postgres 8046 0.0 0.0 238712 4 ? D 00:25
> > 0:13 postgres: tradein tradein_clients 130.94.20.27 SELECT postgres
> > 8089 0.0 0.0 139812 4 ? D 00:26 0:00 postgres: checkpoint
> > subprocess postgres 11442 0.0 0.0 218152 4 ? D 04:25 0:03
> > postgres: tradein tradein_clients 130.94.20.27 SELECT postgres 15453 0.1
> > 0.0 0 0 ? Z 08:17 0:09 [postmaster <defunct>]
> > postgres 15455 0.0 0.0 0 0 ? Z 08:17 0:00
> > [postmaster <defunct>] postgres 15456 0.0 0.0 0 0 ? Z
> > 08:18 0:00 [postmaster <defunct>] postgres 15457 0.0 0.0 0 0 ?
> > Z 08:19 0:00 [postmaster <defunct>] postgres 15462 0.0 0.0
> > 0 0 ? Z 08:20 0:01 [postmaster <defunct>]
>
> I think your postmaster is stuck; it should have reaped those defunct
> subprocesses instantly. Given that you also seem to have a stuck
> checkpoint process (8 hours to run a checkpoint?) there is probably
> something hosed in the interprocess communication logic, but it's hard
> to guess what from this amount of info.
>
> At this point probably your best bet is to kill all the running postgres
> processes (try SIGTERM first, then SIGKILL if that doesn't work) and
> launch a postmaster from a fresh start. Don't forget the ulimit this
> time.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bartus Levente 2002-05-12 10:15:37 Re: [HACKERS] internal voting
Previous Message Joe Conway 2002-05-12 03:36:01 Re: troubleshooting pointers

Browse pgsql-sql by date

  From Date Subject
Next Message Gaetano Mendola 2002-05-12 10:24:57 Re: core file found...
Previous Message Rajesh Kumar Mallah. 2002-05-12 05:40:28 Re: core file found...