Skip site navigation (1) Skip section navigation (2)

Re: Hung postmaster (8.3.9)

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Hung postmaster (8.3.9)
Date: 2010-03-01 23:40:55
Message-ID: 201003011640.55944.pgsql@bluepolka.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
On Monday 01 March 2010 @ 16:03, Ed L. wrote:
> On Monday 01 March 2010 @ 15:59, Ed L. wrote:
> > > This just happened again ~24 hours after full reload from
> > >  backup. Arrrgh.
> > >
> > > Backtrace looks the same again, same file, same
> > > __read_nocancel().  $PGDATA/global/pg_auth looks fine to
> > > me, permissions are 600, entries are 3 or more
> > > double-quoted items per line each separated by a space,
> > > items 3 and beyond being groups.
> > >
> > > Any clues?
> 
> Also seeing lots of postmaster zombies (190 and growing)...

While new connections are hanging, top shows postmaster using 
100% of cpu.  SIGTERM/SIGQUIT do nothing.  Here's a backtrace 
of this busy postmaster:

(gdb) bt
#0  0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
#1  0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
#2  0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3  0x000000346f8689cb in getc () from /lib64/libc.so.6
#4  0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", bufsz=4096) at hba.c:128
#5  0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232
#6  0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 "global/pg_auth", file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at hba.c:358
#7  0x00000000005327ff in load_role () at hba.c:959
#8  0x000000000057f878 in sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:3830
#9  <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6
#11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236
#12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at postmaster.c:1031
#13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at main.c:188

...and more from the server logs, fwiw:

2010-03-01 17:30:24.213 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:30:31.250 CST [32236]    DEBUG:  transaction log switch forced (archive_timeout=300)
2010-03-01 17:31:24.216 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:32:24.219 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:33:24.222 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:34:24.225 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:35:19.061 CST [32236]    LOG:  checkpoint starting: time
2010-03-01 17:35:19.185 CST [32236]    DEBUG:  recycled transaction log file "000000010000001C00000071"
2010-03-01 17:35:19.185 CST [32236]    LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=0.028 s, sync=0.000 s, total=0.124 s
2010-03-01 17:35:24.328 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:35:31.224 CST [32236]    DEBUG:  transaction log switch forced (archive_timeout=300)
2010-03-01 17:36:44.332 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:37:44.434 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  could not receive data from client: Connection timed out
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  unexpected EOF on client connection
2010-03-01 17:37:47.380 CST [3692] dba 10....(42816) dba LOG:  disconnection: session time: 2:11:15.303 user=dba database=dba host=... port=428

In response to

Responses

pgsql-hackers by date

Next:From: Merlin MoncureDate: 2010-03-01 23:42:53
Subject: Re: scheduler in core
Previous:From: Chris BrowneDate: 2010-03-01 23:11:48
Subject: Re: Anyone know if Alvaro is OK?

pgsql-general by date

Next:From: Tom LaneDate: 2010-03-01 23:49:31
Subject: Re: Hung postmaster (8.3.9)
Previous:From: Ed L.Date: 2010-03-01 23:03:23
Subject: Re: Hung postmaster (8.3.9)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group