| From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Add max_wal_replay_size connection parameter to libpq |
| Date: | 2026-03-29 17:56:25 |
| Message-ID: | 126eb1e4-d98e-4647-b629-517adbcad28e@uni-muenster.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
When connecting with target_session_attrs=standby (or prefer-standby,
read-only, any) and multiple standbys are available, libpq currently
selects the first acceptable candidate without regard for how "current"
its data is. A standby configured with recovery_min_apply_delay,
experiencing slow I/O, or otherwise lagging is treated the same as one
that is fully caught up.
I would like to propose a new libpq connection parameter,
max_wal_replay_size, that allows clients to skip standby servers whose
WAL replay backlog exceeds a given threshold.
Example:
psql "host=host1,host2,host3 port=5111,5222,5333 \
target_session_attrs=standby max_wal_replay_size=16MB"
When this parameter is set, libpq executes a small query during
connection establishment to evaluate:
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
on the standby. If the result exceeds the specified threshold, the
server is skipped and the next host in the list is tried. The check is
skipped entirely when target_session_attrs is set to primary or
read-write, since those modes already exclude standbys.
If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
missing primary_conninfo or a disconnected upstream), the backlog cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.
This parameter measures only the apply lag on the standby itself, i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replay it
may report a small backlog here while still being significantly behind.
The attached PoC patch may make the behaviour clearer.
Any feedback on this approach would be appreciated.
Best, Jim
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-libpq-connection-parameter-max_wal_replay_siz.patch | text/x-patch | 25.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-03-29 18:09:31 | Re: pg_restore documentation and --create/--single-transaction limitation |
| Previous Message | Melanie Plageman | 2026-03-29 17:16:39 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |