Tighten SCRAM iteration parsing and bound libpq PBKDF2 work

From: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Tighten SCRAM iteration parsing and bound libpq PBKDF2 work
Date: 2026-05-24 20:04:07
Message-ID: CAH7T-aqV9KVuO2yCpzTM2E0Pz=yJOMhK1PyAGt=od77_OztBVg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached are some SCRAM-related patches.

The first patch fixes parsing of SCRAM iteration counts in both the backend
SCRAM verifier parser and libpq's server-first-message parser. Previously
both paths parsed into a long and then stored the result in an int, so
values in (INT_MAX, LONG_MAX] could be accepted by strtol() and then
narrowed incorrectly. I don't think this allows for any invalid logins as
the password verifier would have tried to verify a different iteration
count, but it's still wrong.

The new common helper, scram_parse_iterations(), rejects values that are
not strictly decimal digits, rejects zero, and rejects values larger than
INT_MAX. This means values with leading whitespace, signs, trailing junk,
decimal points, hex-style prefixes, and integer overflow are all rejected
rather than being partially accepted by strtol() syntax.

Patch 0002 adds a small test_scram test module for direct tests of SCRAM
helpers in src/common. Currently it covers scram_parse_iterations(). I kept
this separate from the parser fix so the bug fix can be backpatched
independently if adding a new test module is considered too invasive for
older branches.

Patches 0003 through 0005 address a separate client-side resource issue. In
a SCRAM exchange, the PBKDF2 iteration count is supplied by the server. A
hostile or misconfigured server can advertise a very large iteration count
and keep a blocking libpq connection inside scram_SaltedPassword() beyond
the caller's connect_timeout. The backend has CHECK_FOR_INTERRUPTS() inside
the loop, but the frontend previously had no equivalent.

These patches mirror the blocking connection attempt's deadline into
PGconn, add an optional interrupt callback to the common SCRAM PBKDF2
helper, and have libpq use that callback to abort once the in-flight
blocking connection deadline has expired. The test modifies a SCRAM
verifier to advertise a very large iteration count and verifies that
connect_timeout interrupts the PBKDF2 loop.

This protection applies to the blocking connection path, such as
PQconnectdb(). It does not make connect_timeout apply automatically to
applications driving PQconnectPoll() themselves.

I considered passing the connection deadline directly into the SCRAM PBKDF2
helper, but used an interrupt callback instead. That keeps the common SCRAM
code independent of libpq's timeout representation and allows the same
mechanism to support other frontend abort conditions, such as cancellation
of an in-progress connection attempt.

Patch 0006 adds a separate libpq connection parameter,
scram_max_iterations, with matching PGSCRAMMAXITERATIONS environment
variable. This places a hard client-side cap on the server-advertised SCRAM
iteration count before PBKDF2 begins. A value of 0 disables the cap.

This is useful independently of connect_timeout as it protects clients that
do not set a timeout. It also protects async PQconnectPoll() users because
it does not depend on the blocking connection deadline.

The attached patch currently uses a default cap of 100K. That is well above
PostgreSQL's normal SCRAM iteration count (4K), but it is still a
client-side behavior change for installations using unusually high SCRAM
iteration counts. For reference, we added a similar patch to pgjdbc with
the same 100K default.

For the connection timeout work, I set up a mock postgres server that
advertises a huge SCRAM iteration count. Without this patch it takes
multiple seconds to perform the client PBKDF2 derivation. It still provides
the "timeout expired" error, but only after the PBKDF2 work is complete:

$ time PGCONNECT_TIMEOUT=1 psql postgresql://user:pass(at)127(dot)0(dot)0(dot)1/db
psql: error: connection to server at "127.0.0.1", port 5432 failed: timeout
expired

real 0m12.716s
user 0m12.613s
sys 0m0.015s

With the patches it rejects the connection attempt immediately (as it
exceeds the 100K default):

$ time LD_LIBRARY_PATH=src/interfaces/libpq PGCONNECT_TIMEOUT=1
src/bin/psql/psql postgresql://user:pass(at)127(dot)0(dot)0(dot)1/db
psql: error: connection to server at "127.0.0.1", port 5432 failed: server
requested SCRAM iteration count 10000000, exceeding scram_max_iterations
(100000)

real 0m0.059s
user 0m0.002s
sys 0m0.005s

Explicitly overriding the max iterations and it respects the 1-second
timeout:

$ time LD_LIBRARY_PATH=src/interfaces/libpq PGCONNECT_TIMEOUT=1
PGSCRAMMAXITERATIONS=0 src/bin/psql/psql postgresql://user:pass(at)127(dot)0(dot)0(dot)1/db
psql: error: connection to server at "127.0.0.1", port 5432 failed: could
not calculate client proof: connection timeout expired

real 0m1.005s
user 0m0.930s
sys 0m0.003s

Some specific things I would particularly like feedback on:

- Whether scram_max_iterations should default to 100K, some other value, or
0 to preserve existing behavior. The last option would still provide an
opt-in hard cap, but would not protect clients unless they configure it.
For reference, pgjdbc ships with a 100K default, but PostgreSQL deployments
with tuned scram_iterations in the 250K–1M range do exist and would need to
set the parameter explicitly on libpq upgrade.
- Whether scram_SaltedPassword() should grow the new interrupt callback
parameters directly rather than via the scram_SaltedPasswordExt() shim used
here. I went with a shim to avoid signature churn for any out-of-tree
consumers of src/common, but I have no strong attachment to the shim if
changing the signature is preferred.
- Whether the in-flight connect deadline should live on PGconn or be passed
through fe_scram_state instead. I went with PGconn on the theory that the
deadline of the current attempt is generally useful state that future auth
paths (GSSAPI, LDAP, etc.) might want to consult, but the narrower scope is
also reasonable.
- Whether the connect_timeout behavior (0003–0005) should be backpatched,
separately from the new scram_max_iterations connection parameter (0006),
which I assume is HEAD-only.
- Whether the iteration-count parsing fix (0001) should be backpatched. The
int-truncation half is safe to backpatch in isolation; tightening
parse_scram_secret() to reject zero/negative iteration counts in stored
verifiers is technically a behavior change though I think it'd be invalid
per the SCRAM RFC.
- Whether the TAP test in 0005, which depends on a 1-second connect_timeout
interrupting a doctored verifier's PBKDF2 loop, is acceptable as written.
The iteration count is large enough that the loop cannot complete in any
realistic CI scenario, but the test is timing-sensitive by nature.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachment Content-Type Size
0005-Add-test-for-connect_timeout-during-SCRAM-iteration.patch text/x-patch 2.0 KB
0004-libpq-honor-connect_timeout-during-SCRAM-iteration.patch text/x-patch 9.0 KB
0002-Add-test_scram-module-for-direct-unit-tests-of-SCRAM.patch text/x-patch 12.8 KB
0003-libpq-expose-connect-deadline-on-PGconn.patch text/x-patch 2.8 KB
0001-Fix-int-overflow-when-parsing-SCRAM-iteration-count.patch text/x-patch 5.9 KB
0006-libpq-add-scram_max_iterations-connection-parameter.patch text/x-patch 10.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2026-05-24 22:18:40 Re: meson: Make test output much more useful on failure (both in CI and locally)
Previous Message Enrique Sánchez 2026-05-24 18:34:52 Re: Extended statistics improvement: multi-column MCV missing values