Re: Add max_wal_replay_size connection parameter to libpq

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add max_wal_replay_size connection parameter to libpq
Date: 2026-03-29 23:51:27
Message-ID: CAHg+QDffk2NSTTvobAqqBmpN+DTZDJ3cdwsi5XxOUmY-MdKgwA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sun, Mar 29, 2026 at 11:53 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
wrote:

>
>
> On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote:
> > What if none of them meets the criteria? You fail the connection?
> > Wouldn't it cause an availability issue?
>
>
> Yes, the connection fails if no host meets the threshold. This is
> intentional, and it is consistent with the existing behaviour of
> target_session_attrs: if you set target_session_attrs=standby and no
> standby is reachable, the connection fails too.
>
>
> > If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver
> due to
> > missing primary_conninfo or a disconnected upstream), the backlog
> cannot
> > be determined. In that case, the standby is treated as exceeding the
> > threshold and is skipped.
> >
> >
> > When a standby is replaying archiving log, it can still be caught up.
> > This doesn't seem right to me.
>
>
> I totally see your point here. The issue is that
> pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver
> process -- regardless of how current the data actually is. Without a
> receive LSN, the metric this parameter is based on (receive_lsn -
> replay_lsn) is simply undefined for that standby.
>
> Please let me know if I am missing something here.
>
>
> >
> > This parameter measures only the apply lag on the standby itself,
> i.e.,
> > how much already-received WAL remains to be replayed. It does not
> > attempt to measure how far the standby is behind the primary. In
> > particular, a standby that is slow to receive WAL but fast to replay
> it
> > may report a small backlog here while still being significantly
> behind.
> >
> >
> > IMHO, this change appears to not meet the objective of routing
> > connections/queries to the most up-to-date standby.
>
>
> The parameter's objective is not to route to the most up-to-date
> standby; it is to skip standbys whose apply lag exceeds a given threshold.
>

What is the expectation from such a routing? Is it for freshness of data
for the client or
freeing up the standby from user connections so that it can catch up with
primary?
The paragraph described originally was talking about the freshness.

Thanks,
Satya

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2026-03-30 00:17:21 Re: POC: Parallel processing of indexes in autovacuum
Previous Message Jelte Fennema-Nio 2026-03-29 22:53:48 Re: Make copyObject work in C++