Re: trouble restarting a server

From: "Peter Koczan" <pjkoczan(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: trouble restarting a server
Date: 2007-05-31 16:55:20
Message-ID: 4544e0330705310955g3c95679vbbda98773a687d22@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

It finally reoccurred. Here's what I got from attaching to those processes
from gdb. I attached with the postmaster binary, let me know if I should use
something else.

vero(su): ps axvw | grep notify
24556 ? Ss 0:03 0 3265 41262 29672 0.7 postgres: jerel
csdb chef(36275) notify interrupt
2889 ? Ss 0:04 0 3265 41270 29688 0.7 postgres: ela csdb
newton(32777) notify interrupt
2943 ? Ss 0:04 0 3265 41270 29684 0.7 postgres: stefan
csdb stupid(32788) notify interrupt
5866 ? Ss 0:04 0 3265 41270 29680 0.7 postgres: petska
csdb brian(32786) notify interrupt
27850 ? Ss 0:03 0 3265 41270 29768 0.7 postgres: dparter
csdb yfandes(35456) notify interrupt
18582 ? Ss 0:03 0 3265 41270 29732 0.7 postgres: timc csdb
tornado(47047) notify interrupt
449 ? Ss 0:02 0 3265 41270 29764 0.7 postgres: archer
csdb spoon(33141) notify interrupt
12731 pts/0 S+ 0:00 0 71 3828 664 0.0 grep notify

vero(su): gdb /s/postgresql/bin/postmaster 24556
[----- Begin lots of messages -----]
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

Attaching to program: /afs/cs.wisc.edu/s/postgresql-8.2.4/@sys/bin/postmaster,
process 5866
Reading symbols from /lib/libpam.so.0...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /afs/cs.wisc.edu/s/openssl-0.9.8d
/@sys/lib/libssl.so.0.9.8d...done.
Loaded symbols for /s/openssl-0.9.8d/lib/libssl.so.0.9.8d
Reading symbols from /afs/cs.wisc.edu/s/openssl-0.9.8d
/@sys/lib/libcrypto.so.0.9.8d...done.
Loaded symbols for /s/openssl-0.9.8d/lib/libcrypto.so.0.9.8d
Reading symbols from
/afs/cs.wisc.edu/s/krb5-1.5.1/@sys/lib/libkrb5.so.3...done.Loaded
symbols for /s/krb5-1.5.1/lib/libkrb5.so.3
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1
/@sys/lib/libcom_err.so.3...done.
Loaded symbols for /s/krb5-1.5.1/lib/libcom_err.so.3
Reading symbols from /lib/libaudit.so.0...done.
Loaded symbols for /lib/libaudit.so.0
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1
/i386_cent40/lib/libk5crypto.so.3...done.
Loaded symbols for /s/krb5-1.5.1/i386_cent40/lib/libk5crypto.so.3
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1
/i386_cent40/lib/libkrb5support.so.0...done.
Loaded symbols for /s/krb5-1.5.1/i386_cent40/lib/libkrb5support.so.0
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
0x007ef7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
[----- End lots of messages -----]

(gdb) bt
#0 0x007ef7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x008c60f3 in __write_nocancel () from /lib/tls/libc.so.6
#2 0x0064d734 in sock_write () from /s/openssl-0.9.8d
/lib/libcrypto.so.0.9.8d
#3 0x00000008 in ?? ()
#4 0x09ede1a2 in ?? ()
#5 0x00000038 in ?? ()
#6 0x09ee50c0 in ?? ()
#7 0x006e06c8 in ?? () from /s/openssl-0.9.8d/lib/libcrypto.so.0.9.8d
#8 0x09ed4148 in ?? ()
#9 0x00000000 in ?? ()

All the other processes are the same except for addresses in #4, #6, and #8,
but they're all within a few MB of each other (they're probably asynchronous
interrupts).

Let me know if you need more info.

Peter

On 5/22/07, Peter Koczan <pjkoczan(at)gmail(dot)com> wrote:
>
> The release is 8.2.4. I haven't been able to reproduce the condition yet,
> but I will send along stack traces as soon as I can. I have this strange
> feeling that it's only going to happen when I find a reason to make a
> restart-worthy config change.
>
> Peter
>
> On 5/21/07, Tom Lane < tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > "Peter Koczan" <pjkoczan(at)gmail(dot)com> writes:
> > > [ lots of processes stuck in "notify interrupt" code ]
> >
> > That's weird. If it's still in that state, or if you can reproduce it,
> > could you attach to a few of those processes with gdb and get stack
> > traces?
> >
> > Looking at the async.c code, an obvious candidate is that that routine
> > tries to take ExclusiveLock on pg_listener --- so if something had
> > managed to exit without releasing a lock on that table, hangups could be
> > expected. But if that were the case, you'd think the process status
> > lines would include "waiting". My guess is they're blocked on something
> >
> > lower-level than a table lock, but without a stack trace it's hard to
> > guess what.
> >
> > Which PG release is this exactly?
> >
> > regards, tom lane
> >
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Andrew Sullivan 2007-05-31 16:58:42 Re: Deletes hurt
Previous Message Joshua D. Drake 2007-05-31 16:23:01 Re: Deletes hurt