Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-01-09 13:44:17
Message-ID: CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > > I see, you were right. This is not related to the MyProc->xmin.
> > > ResolveRecoveryConflictWithTablespace() calls
> > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid). That
> > > would kill WAIT FOR LSN query independently on its xmin.
> >
> > I think the concern is valid --- conflicts like
> > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > backend if the timing is unlucky. It's more difficult to reproduce
> > though. A check for the log containing "conflict with recovery" would
> > likely catch these conflicts as well.
>
> Yes, I found multiple reasons why xmin gets temporarily set during
> processing of WAIT FOR LSN query. I'll soon post a draft patch to fix
> that.
>
> > > I guess your
> > > patch is the only way to go. It's clumsy to wrap WAIT FOR LSN call
> > > with retry loop, but it would still consume less resources than
> > > polling.
> > >
> >
> > Assuming recovery conflicts are relatively rare in tap tests, except
> > for the explicitly designed tests like 031_recovery_conflict and the
> > narrow timing window that the standby has not caught up while the wait
> > for gets invoked, a simple fallback seems appropriate to me.
>
> Yes, I see. Seems acceptable given this seems the only feasible way to go.
>

Here is the updated patch with recovery conflicts handled.

--
Best,
Xuneng

Attachment Content-Type Size
v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch application/octet-stream 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2026-01-09 13:52:59 Re: pg_plan_advice
Previous Message Dilip Kumar 2026-01-09 13:42:53 Re: Proposal: Conflict log history table for Logical Replication