| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
| Subject: | new connection establishment (pgbench --connect) slow with pgbouncer due to libpq/OpenSSL global thread contention |
| Date: | 2026-05-28 07:25:00 |
| Message-ID: | CAKZiRmw6jbShG+LJe0pY5G4w0ktoGwzPozsA=LhH_mN9wC+bhg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi -hackers,
TL;DR; this is just report of performance in-efficiency in pgbench/libpq due
to use of legacy OpenSSL API when stress-testing new connections. After small
chit-chat with Jacob we simply agreed to put it here, so this is just
report to be used as known-limitation without fix/patch and apparently also no
workaround exist (other than using more legacy OpenSSL version).
Context: I was after stress-testing how much new connections/second one can
get more by using pgbouncer (how much more efficient it is) and using pgbench's
--connect:
a) postmaster, PGSSLMODE=disable PGPASSWORD=abc123
/usr/pgsql19/bin/pgbench --connect -c 10 -j 10 -f select1.sql -p 10019
-U app db1 -P 1 -T 300 -r
and got around 3000 new conns/s
pgbench itself was @ ~25% CPU, postmaster of course @ 100% CPU
b) with pgbouncer, PGSSLMODE=disable PGPASSWORD=abc123
/usr/pgsql19/bin/pgbench --connect -c 10 -j 10 -f select1.sql -p 6432
-U app db1 -P 1 -T 300 -r # so just port change
and to my surprise just got ~350 new conn/s
and pgbench got 1000% CPU (that's NOT a typo, 10 cores at 100% CPU)
had the same result using older pgbench versions.
Wild right? I was thinking I screwed up something wrong here, but perf(1) on
pgbench (so client!) literally told me that it is SCRAM (I used the
recommended auth_type=scram-sha-256) causing client-side glibc pthreads
global congestion: please see attached screenshot for perf report picture (as I
don't have text copy right now with me anymore).
After changing pgbouncer's auth_type to "plain" to cross-check I've got
expected boost: 58k new conns/s still with just 20 pooled backends (so it's
single pgbouncer wall on single core: 30k..60k new conns/per second, 10-20x
more than postmaster). But it also indicates that pgbouncer is doing
something different with SCRAM that server does - I havent dig into that yet,
no bandwidth. Also the thing is that HMAC_Init_ex(3) is deprecated since
OpenSSL 3.0, so maybe there's better way (and that part is in libpq/core).
It looks to me to be more like libpq/pgbench issue than pgbouncer issue (well
it's inside pg_fe_sendauth() so any heavily threaded libpq client is affected
in theory).
One may say noone should need those rates for new connections per second,
especially from single application, full agreement. Yet at the same time it is
still somehow worrying that pgbench --connect against pgbouncer got like 10%
of what same pgbench --connnect against postmaster can get
md5 is also affected as HMAC_Init_ex(3) is also called there.
Legacy distriubtions (like RHEL8) are using more legacy OpenSSL 1.1.1k so they
are not affeced. Probably this is due to new provider architecture in OpenSSL
3.x. Gemini told me me that HMAC_Init_ex(3) was retrofitted for backward
compatibility and under the hood, calling HMAC_Init_ex(3) now implicitly
forces OpenSSL to query its own internal provider registry via internal fetch
routines (like EVP_MAC_fetch() and EVP_MD_fetch()) to locate the SHA-256
digest implementation. Querying this provider registry requires acquiring
an internal OpenSSL global read/write lock . So anyway, probably that parts
of code could be modernized to more modern EVP_MAC provider-aware API
(EVP_MAC_fetch()+EVP_MAC_CTX_new once and EVP_MAC_CTX_dup() per
connection thread). I haven't looked how that fits current libpq APIs/pgbench
code (if it possible at all). I was able to find [1] thread that mentions
why it was written this way:
I think that this is a bit too new to use though, as we need to
support OpenSSL down to 1.0.1 on HEAD and because there are
compatibility macros. So instead I have decided to rely on the older
interface based on HMAC_Init_ex()
I understand that this would be somehow related with dropping support for legacy
OpenSSL (?).
-J.
[1] - https://www.postgresql.org/message-id/flat/X9m0nkEJEzIPXjeZ%40paquier.xyz
| Attachment | Content-Type | Size |
|---|---|---|
| pgbench_openssl_pthread_contention.png | image/png | 899.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Langote | 2026-05-28 08:13:53 | Re: generic plans and "initial" pruning |
| Previous Message | Ayush Tiwari | 2026-05-28 06:01:16 | pg_rewind: Skip vanished source files during traversal |