Re: Hung postmaster (8.3.9)

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Hung postmaster (8.3.9)
Date: 2010-03-01 22:59:09
Message-ID: 201003011559.09901.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Monday 01 March 2010 @ 15:46, Ed L. wrote:
> On Monday 01 March 2010 @ 15:41, Ed Loehr (LoehrTech.com) wrote:
> > "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > > (gdb) bt
> > > #0 0x000000346f8c43a0 in __read_nocancel () from
> > > /lib64/libc.so.6 #1 0x000000346f86c747 in
> > > _IO_new_file_underflow () from /lib64/libc.so.6 #2
> > > 0x000000346f86d10e in _IO_default_uflow_internal () from
> > > /lib64/libc.so.6 #3 0x000000346f8689cb in getc () from
> > > /lib64/libc.so.6 #4 0x0000000000531ee8 in next_token
> > > (fp=0x5b90f20, buf=0x7fff59bef330 "", bufsz=4096) at
> > > hba.c:128 #5 0x0000000000532233 in tokenize_file
> > > (filename=0x5b8f3f0 "global", file=0x5b90f20,
> > > lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at
> > > hba.c:232 #6 0x00000000005322e9 in tokenize_file
> > > (filename=0x5b8f3d0 "global/pg_auth", file=0x5b90ce0,
> > > lines=0x98b168, line_nums=0x98b170) at hba.c:358
> > > #7 0x00000000005327ff in load_role () at hba.c:959
> > > #8 0x000000000057f300 in reaper
> > > (postgres_signal_arg=<value optimized out>) at
> > > postmaster.c:2145 #9 <signal handler called>
> > > #10 0x000000346f8cb323 in __select_nocancel () from
> > > /lib64/libc.so.6 #11 0x000000000057cc33 in ServerLoop ()
> > > at postmaster.c:1236 #12 0x000000000057dfdf in
> > > PostmasterMain (argc=6, argv=0x5b73fe0) at
> > > postmaster.c:1031 #13 0x00000000005373de in main (argc=6,
> > > argv=<value optimized out>) at main.c:188
> >
> > The postmaster seems to be stuck trying to read
> > $PGDATA/global/pg_auth (which would be an expected thing
> > for it to do at this point in the startup sequence). Does
> > that file exist? Is it an ordinary file? Do its contents
> > look sane (a list of your userids and their passwords and
> > group memberships)?
>
> This just happened again ~24 hours after full reload from
> backup. Arrrgh.
>
> Backtrace looks the same again, same file, same
> __read_nocancel(). $PGDATA/global/pg_auth looks fine to me,
> permissions are 600, entries are 3 or more double-quoted items
> per line each separated by a space, items 3 and beyond being
> groups.
>
> Any clues?

Watching the server logs, the system is continuing to process
data on existing connections. Just can't get any new ones.
Here's a backtrace for a hung psql -c "select version()":

$ gdb `which psql`
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) attach 9631
Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/psql, process 9631
Reading symbols from /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5...done.
Loaded symbols for /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /usr/lib64/libreadline.so.5...done.
Loaded symbols for /usr/lib64/libreadline.so.5
Reading symbols from /lib64/libtermcap.so.2...done.
Loaded symbols for /lib64/libtermcap.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
0x000000346f8c92af in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x000000346f8c92af in poll () from /lib64/libc.so.6
#1 0x00002b03826e5e6f in pqSocketCheck (conn=0x655eef0, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1046
#2 0x00002b03826e5f10 in pqWaitTimed (forRead=1, forWrite=-1, conn=0x655eef0, finish_time=-1) at fe-misc.c:920
#3 0x00002b03826e1752 in connectDBComplete (conn=0x655eef0) at fe-connect.c:930
#4 0x00002b03826e2c60 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0, pgtty=0x0, dbName=0x0, login=0x0, pwd=0x0) at fe-connect.c:678
#5 0x000000000040e319 in main (argc=<value optimized out>, argv=0x7fff283ce6e8) at startup.c:195

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ed L. 2010-03-01 23:03:23 Re: Hung postmaster (8.3.9)
Previous Message Ed L. 2010-03-01 22:46:59 Re: Hung postmaster (8.3.9)

Browse pgsql-hackers by date

  From Date Subject
Next Message Ed L. 2010-03-01 23:03:23 Re: Hung postmaster (8.3.9)
Previous Message Kevin Grittner 2010-03-01 22:52:50 Re: Re: Hot Standby query cancellation and Streaming Replication integration