RE: Assertion failure in SnapBuildInitialSnapshot()

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Pradeep Kumar <spradeepkumar29(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Assertion failure in SnapBuildInitialSnapshot()
Date: 2025-11-07 02:59:59
Message-ID: TY4PR01MB169070EE618FA2908B3D2F2AE94C3A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, November 7, 2025 2:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Thu, Nov 6, 2025 at 2:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Thu, Nov 6, 2025 at 12:03 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Thursday, October 30, 2025 7:01 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Also, I think it's worth considering the idea Robert shared before[1]:
> > > >
> > > > ---
> > > > But what about just surgically preventing that?
> > > > ProcArraySetReplicationSlotXmin() could refuse to retreat the values,
> > > > perhaps? If it computes an older value than what's there, it just does
> nothing?
> > > > ---
> > > >
> > > > We did a similar fix for confirmed_flush LSN by commit ad5eaf390c582,
> and it
> > > > sounds reasonable to me that ProcArraySetReplicationSlotXmin()
> refuses to
> > > > retreat the values.
> > >
> > > I reviewed the thread and think that we could not straightforwardly apply a
> > > similar strategy to prevent the retreat of xmin/catalog_xmin here. This is
> > > because we maintain a central value
> > > (replication_slot_xmin/replication_slot_catalog_xmin) in
> > > ProcArraySetReplicationSlotXmin, where the value is expected to decrease
> when
> > > certain slots are dropped or invalidated.
> > >
> >
> > Good point. This can happen when the last slot is invalidated or dropped.
>
> After the last slot is invalidated or dropped, both slot_xmin and
> slot_catalog_xmin values are set InvalidTransactionId. Then in this
> case, these values are ignored when computing the oldest safe decoding
> XID in GetOldestSafeDecodingTransactionId(), no? Or do you mean that
> there is a case where slot_xmin and slot_catalog_xmin retreat to a
> valid XID?

I think when replication_slot_xmin is invalid,
GetOldestSafeDecodingTransactionId would return nextXid, which can be greater
than the original snap.xmin if some transaction IDs have been assigned. After
reviewing the report [1], the bug appears reproducible when
replication_slot_xmin is set to InvalidTransactionId (specific reproduction
steps are detailed at [2]) as well. Therefore, if we adopt the approach to
prevent retreating these values, we need to somehow avoid resetting
replication_slot_xmin, but that seems conflict with the behavior of resetting
replication_slot_xmin when dropping the last slot.

[1] https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-11-07 03:32:21 Re: [Patch] Windows relation extension failure at 2GB and 4GB
Previous Message Thomas Munro 2025-11-07 02:56:07 Re: [Patch] Windows relation extension failure at 2GB and 4GB