Re: [HACKERS] Major bug, possible, with Solaris 7?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Daryl W(dot) Dunbar" <daryl(at)www(dot)com>
Cc: "The Hermit Hacker" <scrappy(at)hub(dot)org>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Major bug, possible, with Solaris 7?
Date: 1999-02-20 21:48:52
Message-ID: 21979.919547332@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Daryl W. Dunbar" <daryl(at)www(dot)com> writes:
> Problem still exists in 6.4.3.

I figured it probably would :-(.

As far as I can tell from your truss trace, the processes are going
to sleep via semop() and never being awoken. There's not much more
that we can find out at the kernel level, since the kernel can't tell
*why* a backend thinks it needs to go to sleep. Assuming that
TEST_AND_SET is defined in your compilation, the backend only use
one semaphore apiece and all blocking/awakening is done via the same
semaphore. We need to know what lock-manager condition is causing
each backend to decide to block and why the lock is not getting
released.

I was hoping that a gdb backtrace would tell us more --- it's bad that
you can't get any info that way. On my system (HPUX) gdb has a problem
with debugging shared libraries in a process that you attach to, as
opposed to starting fresh under gdb. I dunno if Solaris is similar, but
it might be worth building your -g version of the backend with no shared
libraries, everything linked statically (-static option, I think, when
linking the postgres binary). If your system doesn't have a static
version of libc then this won't help.

But probably the first thing to try at this point is adding a bunch of
debugging printouts. If you compile with -DLOCK_MGR_DEBUG (see
src/backend/storage/lmgr/lock.c) and turn on the trace-locks option then
you'll get a bunch more log output that should tell us something useful
about why the processes are deciding to block.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-02-21 03:47:55 Re: Max backend limits cleaned up
Previous Message James Thompson 1999-02-20 21:41:17 Bug in src/backend/nodes/print.c