| From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> | 
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Active zombies at AIX | 
| Date: | 2017-01-24 15:08:05 | 
| Message-ID: | 06f4d085-e2a5-83a7-919a-cb5a878f9e42@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi hackers,
Yet another story about AIX. For some reasons AIX very slowly cleaning 
zombie processes.
If we launch pgbench with -C parameter then very soon limit for maximal 
number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds 
of pgbench activity we get about 900 zombie processes and it takes about 
100 seconds (!)
before all of them are terminated.
proctree shows a lot of defunt processes:
[14:44:41]root(at)postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 <defunct>
4980786 <defunct>
11403448 <defunct>
11468930 <defunct>
11993176 <defunct>
12189710 <defunct>
12517390 <defunct>
13238374 <defunct>
13565974 <defunct>
13893826 postgres: wal writer process
14024716 <defunct>
15401000 <defunct>
...
25691556 <defunct>
But ps shows that status of process is <existing>
[14:46:02]root(at)postgres:~ # ps -elk | grep 25691556
* A - 25691556 - - - - - <exiting>
Breakpoint set in reaper() function in postmaster shows that each 
invocation of this functions (called by SIGCHLD handler) proceed 5-10 
PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering 
SIGCHLD to parent, either exit of process takes too much time.
The fact the backends are in exiting state makes second hypothesis more 
reliable.
We have tried different Postgres configurations with local and TCP 
sockets, with different amount of shared buffers and built both with gcc 
and xlc.
In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it 
is not clear how to understand what's going on.
I wonder if somebody has encountered similar problems at AIX and may be 
can suggest some solution to solve this problem.
Thanks in advance
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2017-01-24 15:08:21 | Re: [COMMITTERS] pgsql: Add pg_sequence system catalog | 
| Previous Message | Tom Lane | 2017-01-24 15:00:11 | Re: Assignment of valid collation for SET operations on queries with UNKNOWN types. |