Re: Question about InvalidatePossiblyObsoleteSlot()

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>
Cc: michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "amit(dot)kapila16" <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about InvalidatePossiblyObsoleteSlot()
Date: 2025-10-13 06:27:25
Message-ID: aOybzXERfbV3CG/b@ip-10-97-1-34.eu-west-3.compute.internal
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Oct 09, 2025 at 10:49:39AM +0800, suyu.cmj wrote:
> Hi,
> Thank you for the reference to commit 818fefd8fd4 and the related discussion thread. I understand the intent of introducing initial_restart_lsn was to preserve a consistent invalidation cause throughout the invalidation loop.
> However, I still have a few concerns about this design change:
> 1. I understand the intention to keep the invalidation cause consistent, but If a slot's restart_lsn advances significantly during the invalidation check—indicating it is actively in use—shouldn't we reconsider invalidating it?
> 2. What potential issues arise if we refrain from invalidating slots whose restart_lsn advances during the invalidation process? Intuitively, an actively used slot that has moved it's restart_lsn beyond the problematic point should not be marked invalid.
> 3. If the current approach is indeed correct, should we consider making PG15 and earlier consistent with this behavior? The behavioral difference across versions may lead to different operational outcomes in otherwise similar situations.
> I would appreciate your insights on these points.

I agree that before 818fefd8fd4 the invalidation cause could move from
RS_INVAL_WAL_REMOVED to RS_INVAL_NONE if the slot restart lsn has been able to
advance enough between the time we release the mutex and do the next check.

With 818fefd8fd4 that's not the case anymore and we keep WAL_REMOVED as the
invalidation cause (even if the slot restart lsn has been able to advance
enough).

That looks safe to use the pre 818fefd8fd4 behavior for the slot restart lsn
case because the WAL files have not yet been removed by the checkpointer/startup
process when it's busy in InvalidatePossiblyObsoleteSlot().

I think that we could get rid of the initial_restart_lsn and just use
s->data.restart_lsn here (while keeping initial xmin ones to preserve the
intent of 818fefd8fd4 for those).

Something like the attached?

Your concern is only about the restart_lsn, right? (asking because I don't think
it's safe not to rely on the initial_* for the xmin ones, see [1]).

[1]: https://www.postgresql.org/message-id/ZaACEPtVovKlRVUf%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Don-t-use-initial_restart_lsn-in-InvalidatePossib.patch text/x-diff 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul A Jungwirth 2025-10-13 06:43:20 Re: SQL:2011 Application Time Update & Delete
Previous Message Peter Smith 2025-10-13 05:44:42 Re: Add support for specifying tables in pg_createsubscriber.