Re: Question about InvalidatePossiblyObsoleteSlot()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about InvalidatePossiblyObsoleteSlot()
Date: 2025-10-22 08:48:33
Message-ID: CAA4eK1Kvj+XAWoLFx7ipFqKY3EB+htAF_dfyACZVW4UEbyX1bw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 20, 2025 at 11:41 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Fri, Oct 17, 2025 at 03:08:07PM -0700, Masahiko Sawada wrote:
> > On Fri, Oct 17, 2025 at 12:18 AM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > > do you think that's also safe to not
> > > invalidate the slots for effective_xmin and catalog_effective_xmin if they
> > > advance far enough?
> >
> > I find the same in xmin cases. ResolveRecoveryConflictWithSnapshot()
> > is called only during the recovery by the startup process, and it also
> > tries to invalidate possibly obsolete slots. Catalog tuples are not
> > removed as long as the startup calls
> > ResolveRecoveryConflictWithSnapshot() before actually removing the
> > tuples and it's busy in InvalidatePossiblyObsoleteSlot().
>
> I looked more closely at the xmin related cases and I agree with the above.
>
> > I might be
> > missing something but this is the reason why I'm confused with the
> > 818fefd8fd4 fix and the proposed change.
>
> Yeah so 818fefd8fd4 is well suited for tests consistency but in some rare cases
> it invalidates a slot while it would be safe not to do so.
>
> OTOH it looks to me that the initial pre-818fefd8fd4 intend was to invalidate
> the slot as per this comment:
>
> "
> /*
> * Re-acquire lock and start over; we expect to invalidate the
> * slot next time (unless another process acquires the slot in the
> * meantime).
> */
> "
>

The comment doesn't indicate the intent that we will invalidate the
slot after re-acquiring the lock even when the new conditions don't
warrant the slot to be invalidated. The comment could be improved
though.

> The fact that it could move forward far enough before we terminate the
> process holding the slot is a race condition due to the fact that we released
> the mutex.
>
> If the above looks right to you then 818fefd8fd4 is doing what was "initially"
> expected, do you agree?
>

Based on the discussion and points presented in this thread, I don't
agree. I feel we should restore behavior prior to 818fefd8fd4 and fix
the test case which relies on different messages.

> If so, then maybe it's fine to keep 818fefd8fd4 as is?
>

I don't think so.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-10-22 08:49:27 Re: CI: Add task that runs pgindent
Previous Message Michael Paquier 2025-10-22 08:44:38 Re: Add wal_fpi_bytes_[un]compressed to pg_stat_wal