postmaster wedged

From: PG <defunct_shell(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: postmaster wedged
Date: 2004-02-09 16:36:03
Message-ID: 20040209163603.89748.qmail@web60307.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


We use postgresql 7.4 running on a modified redhat
linux system as our database to store network related
data. The tables have millions of rows and several
joins on these tables are typically done in response
to user queries. The database itself takes about 40Gb
of disk space. Our application uses libpq++.

Recently I found what appears to be a postgres bug.
The database was running fine and at some point
stopped accepting connections. I logged onto the
system to find 2 postmaster processes in S state.

[root]# ps -aeuwx | grep post
postgres 2080 0.0 0.0 26752 2360 ? S
Jan28 0:00 postgres:
postgres dbname 1.0.0.5 idle
postgres 2081 0.0 0.0 26744 2380 ? S
Jan28 0:00 postgres:
postgres dbname 1.0.0.5 idle

Both postmaster processes had the same stack trace on
doing a gdb attach (see below) and they were both
child processes of init (how could it have started
twice ?).

After shutting off all possible clients, I tried to do
a postgresql stop. That didn't work. Neither did
pg_ctl (using fast or immediate). Then a killall -9
postmaster followed by a postgresql start, got it to
reading
XLOGS for 5mins or so, after which it was back up
without any loss/corruption of data.

Any ideas ? Is it possible that our application
(through libpq++) somehow caused postmaster to hang ?

Thanks
Prem.

[root]# su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl
stop -D /var/lib/pgsql/data -s -m fast"
/usr/bin/pg_ctl: line 274: kill: (1066) - No such
process
pg_ctl: postmaster does not shut down

#0 0x18364c26 in recv () from /lib/libc.so.6
#1 0x080feaa5 in secure_read
(port=0x82873b8,ptr=0x8240580, len=8192)
at
/root/src/postgres/src/backend/libpq/be-secure.c:304
#2 0x08103b83 in pq_recvbuf ()
at
/root/src/postgres/src/backend/libpq/pqcomm.c:662
#3 0x08103c59 in pq_getbyte ()
at
/root/src/postgres/src/backend/libpq/pqcomm.c:704
#4 0x0814c935 in SocketBackend (inBuf=0xbfffec10)
at
/root/src/postgres/src/backend/tcop/postgres.c:275
#5 0x0814cb17 in ReadCommand (inBuf=0xfffffe00)
at
/root/src/postgres/src/backend/tcop/postgres.c:397
#6 0x0814f018 in PostgresMain (argc=4,
argv=0x8279590,
username=0x8279560 "postgres")
at
/root/src/postgres/src/backend/tcop/postgres.c:2832
#7 0x0812f24b in BackendFork (port=0x82873b8)
at
/root/src/postgres/src/backend/postmaster/postmaster.c:2558
#8 0x0812ed3e in BackendStartup (port=0x82873b8)
at
/root/src/postgres/src/backend/postmaster/postmaster.c:2201
#9 0x0812d5ff in ServerLoop ()
at
/root/src/postgres/src/backend/postmaster/postmaster.c:1113
#10 0x0812cfa4 in PostmasterMain (argc=4,
argv=0x82786e8)
at
/root/src/postgres/src/backend/postmaster/postmaster.c:891
#11 0x08104d74 in main (argc=4, argv=0xbffffb94)
at /root/src/postgres/src/backend/main/main.c:214
---Type <return> to continue, or q <return> to quit---
#12 0x182a55cd in __libc_start_main () from
/lib/libc.so.6

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2004-02-09 16:45:31 Re: psql variables
Previous Message Tom Lane 2004-02-09 16:26:43 Re: [HACKERS] Sync vs. fsync during checkpoint