Re: AIX support

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aditya Kamath <Aditya(dot)Kamath1(at)ibm(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Srirama Kucherlapati <sriram(dot)rk(at)in(dot)ibm(dot)com>, "peter(at)eisentraut(dot)org" <peter(at)eisentraut(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "tristan(at)partin(dot)io" <tristan(at)partin(dot)io>, "postgres-ibm-aix(at)wwpdl(dot)vnet(dot)ibm(dot)com" <postgres-ibm-aix(at)wwpdl(dot)vnet(dot)ibm(dot)com>
Subject: Re: AIX support
Date: 2026-02-03 16:33:20
Message-ID: 1299410.1770136400@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just to play devil's advocate for a minute:

I've managed to run check-world (with --enable-tap-tests, but few
optional features) on the GCC compile farm's AIX 7.3 machine,
cfarm119.cfarm.net.

It took more than five hours.

bash-5.3$ time make -s check-world -j2 PROVE_FLAGS='--quiet --nocolor --nocount' >/dev/null

real 311m10.942s
user 4m26.936s
sys 4m13.243s

(I can't go higher than -j2 due to the machine's restrictive ulimit -u
setting. Perfectly reasonable policy for a shared resource, and
that's not what I'm griping about.)

This compares unfavorably to my 2004-vintage Mac PPC G4 laptop
(running NetBSD 10.1), let alone anything remotely modern. The G4
needs about three-and-two-thirds hours for substantially the same
test:

$ time make -s check-world -j2 PROVE_FLAGS='--quiet --nocolor --nocount' >/dev/null
13188.92s real 1229.68s user 1210.25s system

If the user/system times are to be trusted, the AIX machine is indeed
several times faster than the G4 CPU-wise, so why is it so slow?

Apparently, because its file system sucks. One thing we do over and
over in the TAP tests is to copy an initialized data directory to
prepare a new instance, basically "cp -RPp template-dir $PGDATA".
I'm observing that taking about 22 seconds on the AIX machine
(which is actually slower than running initdb would be: about 15s),
compared to 2.6s on the G4, and about 0.035s on my Linux workstation
(which can do the same overall -j2 check-world in five minutes).

To be clear, there is as far as I can tell next to zero background
I/O load on cfarm119. This is a typical readout when I'm not
running anything:

$ iostat 1

System configuration: lcpu=40 drives=12 ent=4.00 paths=10 vdisks=6

tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc
0.0 70.0 24.9 33.5 41.6 0.0 4.0 100.5

Disks: % tm_act Kbps tps Kb_read Kb_wrtn
cd1 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk9 0.0 0.0 0.0 0 0
hdisk8 0.0 0.0 0.0 0 0
hdisk7 0.0 0.0 0.0 0 0
hdisk6 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk2 100.0 640.0 160.0 0 640
hdisk5 0.0 0.0 0.0 0 0
hdisk0 0.0 0.0 0.0 0 0

Unless there is something seriously wrong with how cfarm119 is set up,
the conclusion has to be that AIX is mind-bogglingly bad at disk I/O.

This conclusion is borne out by some simple pgbench testing:
the AIX machine performs somewhat-respectably on select-only
tests, or on pgbench's default test with fsync off, but on the
default test with fsync on it gets half the TPS rate of the G4:

cfarm119:
bash-5.3$ pgbench -T 60 -j 4 -c 4 bench
pgbench (19devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 4
number of threads: 4
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 2493
number of failed transactions: 0 (0.000%)
latency average = 96.455 ms
initial connection time = 33.624 ms
tps = 41.469913 (without initial connection time)

g4:
[tgl(at)g42]$ pgbench -T 60 -j 4 -c 4 bench
pgbench (19devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 4
number of threads: 4
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 5767
number of failed transactions: 0 (0.000%)
latency average = 41.619 ms
initial connection time = 122.045 ms
tps = 96.109550 (without initial connection time)

Remind me again why anyone would choose to run Postgres on this
platform? Why are we moving mountains to make it possible?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florents Tselai 2026-02-03 16:40:49 Re: Emitting JSON to file using COPY TO
Previous Message Christoph Berg 2026-02-03 16:27:24 Re: Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements