Re: Deadlock in XLogInsert at AIX

From: "REIX, Tony" <tony(dot)reix(at)atos(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Bernd Helmle <mailings(at)oopsware(dot)de>
Subject: Re: Deadlock in XLogInsert at AIX
Date: 2017-02-02 14:10:45
Message-ID: ed9bd7fc-2d29-dc77-e537-bb463f93e7d4@atos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Konstantin

I've discussed the "zombie/exit" issue with our expert here.

- He does not think that AIX has anything special here

- If the process is marked <exiting> in ps, this is because the flag SEXIT is set, thus the process is blocked somewhere in the kexitx() syscall, waiting for something.

- In order to know what it is waiting for, the best would be to have a look with kdb.

- either it is waiting for an asynchronous I/O to end, or a thread to end if the process is multi-thread

- Using the proctree command for analyzing the issue is not a good idea, since the process will block in kexitx() if there is an operation on /proc being done

- If the process is marked <defunct>, that means that the process has not called waitpid() yet for getting the son's status. Maybe the parent is blocked in non-interruptible code where the signal handler cannot be called.

- In short, that may be due to many causes... Use kdb is the best way.

- Instead of proctree (which makes use of /proc), use: "ps -faT <pid>".

I'll try to reproduce here.

Regards

Tony

Le 01/02/2017 à 21:26, Konstantin Knizhnik a écrit :
On 02/01/2017 08:30 PM, REIX, Tony wrote:

....

About the zombie issue, I've discussed with my colleagues. Looks like the process keeps zombie till the father looks at its status. However, though I did that several times, I do not remember well the details. And that should be not specific to AIX. I'll discuss with another colleague, tomorrow, who should understand this better than me.

1. Process is not in zomby state (according to ps). It is in <exiting> state... It is something AIX specific, I have not see processes in this state at Linux.
2. I have implemented simple test - forkbomb. It creates 1000 children and then wait for them. It is about ten times slower than at Intel/Linux, but still much faster than 100 seconds. So there is some difference between postgress backend and dummy process doing nothing - just immediately terminating after return from fork()
....

Regards,

Tony

Le 01/02/2017 à 16:59, Konstantin Knizhnik a écrit :
Hi Tony,

On 01.02.2017 18:42, REIX, Tony wrote:

Hi Konstantin

XLC.

I'm on AIX 7.1 for now.

I'm using this version of XLC v13:

# xlc -qversion
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0003

With this version, I have (at least, since I tested with "check" and not "check-world" at that time) 2 failing tests: create_aggregate , aggregates .

With the following XLC v12 version, I have NO test failure:

# /usr/vac/bin/xlc -qversion
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01.0000.0016

So maybe you are not using XLC v13.1.3.3, rather another sub-version. Unless you are using more options for the configure ?

Configure.

What are the options that you give to the configure ?

export CC="/opt/IBM/xlc/13.1.3/bin/xlc"
export CFLAGS="-qarch=pwr8 -qtune=pwr8 -O2 -qalign=natural -q64 "
export LDFLAGS="-Wl,-bbigtoc,-b64"
export AR="/usr/bin/ar -X64"
export LD="/usr/bin/ld -b64 "
export NM="/usr/bin/nm -X64"
./configure --prefix="/opt/postgresql/xlc-debug/9.6"

Hard load & 64 cores ? OK. That clearly explains why I do not see this issue.

pgbench ? I wanted to run it. However, I'm still looking where to get it plus a guide for using it for testing.

pgbench is part of Postgres distributive (src/bin/pgbench)

I would add such tests when building my PostgreSQL RPMs on AIX. So any help is welcome !

Performance.

- Also, I'd like to compare PostgreSQL performance on AIX vs Linux/PPC64. Any idea how I should proceed ? Any PostgreSQL performance benchmark that I could find and use ? pgbench ?

pgbench is most widely used tool simulating OLTP workload. Certainly it is quite primitive and its results are rather artificial. TPC-C seems to be better choice.
But the best case is to implement your own benchmark simulating actual workload of your real application.

- I'm interested in any information for improving the performance & quality of my PostgreSQM RPMs on AIX. (As I already said, BullFreeware RPMs for AIX are free and can be used by anyone, like Perzl RPMs are. My company (ATOS/Bull) sells IBM Power machines under the Escala brand since ages (25 years this year)).

How to help ?

How could I help for improving the quality and performance of PostgreSQL on AIX ?

We still have one open issue at AIX: see https://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg303094.html
It will be great if you can somehow help to fix this problem.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-02-02 14:37:00 Re: Enabling replication connections by default in pg_hba.conf
Previous Message Simon Riggs 2017-02-02 13:32:19 Re: Enabling replication connections by default in pg_hba.conf