Re: Quite strange crash

From: Denis Perchine <dyp(at)perchine(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Quite strange crash
Date: 2001-01-09 06:13:29
Message-ID: 0101091213290B.00613@dyp.perchine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday 08 January 2001 23:21, Tom Lane wrote:
> Denis Perchine <dyp(at)perchine(dot)com> writes:
> >>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >>>>>
> >>>>> Were there any errors before that?
> >
> > Actually you can have a look on the logs yourself.
>
> Well, I found a smoking gun:
>
> Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down
>
> PID 2501 had been running:
>
> Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do
something for enormous amount of time (I have mailed a sample database to
Vadim, but did not get any response yet). It is possible, that it was me, who
killed the backend.

> What seems to have happened is that 2501 curled up and died, leaving
> one or more buffer spinlocks locked. Roughly one spinlock timeout
> later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
> So that fits.
>
> The real question is what happened to 2501? None of the other backends
> reported a SIGTERM signal, so the signal did not come from the
> postmaster.
>
> Another interesting datapoint: there is a second place in this logfile
> where one single backend reports SIGTERM while its brethren keep running:
>
> Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose;
> ...
> Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down

Hmmm... Maybe this also was me... But I am not sure here.

> There is something pretty fishy about this. You aren't by any chance
> running the postmaster under a ulimit setting that might cut off
> individual backends after a certain amount of CPU time, are you?

[postgres(at)mx postgres]$ ulimit -a
core file size (blocks) 1000000
data seg size (kbytes) unlimited
file size (blocks) unlimited
max memory size (kbytes) unlimited
stack size (kbytes) 8192
cpu time (seconds) unlimited
max user processes 2048
pipe size (512 bytes) 8
open files 1024
virtual memory (kbytes) 2105343

No, there are no any ulimits.

> What signal does a ulimit violation deliver on your machine, anyway?

if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {
/* Send SIGXCPU every second.. */
if (!(psecs % HZ))
send_sig(SIGXCPU, p, 1);
/* and SIGKILL when we go over max.. */
if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max)
send_sig(SIGKILL, p, 1);
}

This part of the kernel show the logic. This mean that process wil get
SIGXCPU each second if it above soft limit, and SIGKILL when it will be above
hardlimit.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp(at)perchine(dot)com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-01-09 06:23:26 Re: Quite strange crash
Previous Message Tom Lane 2001-01-09 06:03:54 Re: Quite strange crash