Re: Adding REPACK [concurrently]

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-05-12 07:57:29
Message-ID: CAA4eK1L1SCa7LjOVzq3zmmqpSV9o7z7VBkPjbE6=2iRLTSBvXw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 12, 2026 at 1:00 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > On Tue, May 5, 2026 at 6:17 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > >
> > > > Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > >
> > > > I think the problem is that with database-specific snapshot,
> > > > SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin
> > > >
> > > > /*
> > > > * Database specific transaction info may exist to reach CONSISTENT state
> > > > * faster, however the code below makes no use of it. Moreover, such
> > > > * record might cause problems because the following normal (cluster-wide)
> > > > * record can have lower value of oldestRunningXid. In that case, let's
> > > > * wait with the cleanup for the next regular cluster-wide record.
> > > > */
> > > > if (OidIsValid(running->dbid))
> > > > return;
> > > >
> > > > and thus some transactions whose XID is below running->oldestRunningXid may
> > > > continue to be incorrectly considered running.
> > > >
> > > > I originally thought that this should not happen because such transactions
> > > > will be added to the builder's array of committed transactions by
> > > > SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record of
> > > > a transaction listed in the xl_running_xacts WAL record is not guaranteed to
> > > > follow the xl_running_xacts record in WAL. In other words, even if
> > > > xl_running_xacts is created before a COMMIT record of the contained
> > > > transaction, it may end up at higher LSN in WAL. So the cleanup I relied on
> > > > might not take place.
> > > >
> > >
> > > BTW, is it possible to write a test by using injection_points or via
> > > manual steps (by using debugger, etc) so that we can more clearly
> > > understand this problem and proposed fix?
> >
> > So far I could observe the situation in WAL, but have no idea how it can
> > happen. For example, transaction 49242 gets committed here
> >
> > rmgr: Transaction len (rec/tot): 46/ 46, tx: 49242, lsn: 0/18BC28C8, prev
> > 0/18BC2890, desc: COMMIT 2026-05-11 16:38:16.603265 CEST
> >
> > and then it appears in the 'xids' list of RUNNING_XACTS:
> >
> > rmgr: Standby len (rec/tot): 106/ 106, tx: 0, lsn:
> > 0/18BC3140, prev 0/18BC3100, desc: RUNNING_XACTS nextXid 49255
> > latestCompletedXid 49241 oldestRunningXid 49242; 13 xacts: 49248 49249 49246
> > 49243 49252 49251 49244 49245 49242 49250 49253 49254 49247; dbid:5
> >
> >
> > I thought the situation is quite common (and therefore nothing of
> > SnapBuildProcessRunningXacts() should be skipped), but when trying to
> > reproduce the problem, I noticed that LogStandbySnapshot() shouldn't allow
> > that ordering issue when logical decoding is enabled:
> >
> > /*
> > * GetRunningTransactionData() acquired ProcArrayLock, we must release it.
> > * For Hot Standby this can be done before inserting the WAL record
> > * because ProcArrayApplyRecoveryInfo() rechecks the commit status using
> > * the clog. For logical decoding, though, the lock can't be released
> > * early because the clog might be "in the future" from the POV of the
> > * historic snapshot. This would allow for situations where we're waiting
> > * for the end of a transaction listed in the xl_running_xacts record
> > * which, according to the WAL, has committed before the xl_running_xacts
> > * record. Fortunately this routine isn't executed frequently, and it's
> > * only a shared lock.
> > */
> > if (!logical_decoding_enabled)
> > LWLockRelease(ProcArrayLock);
> >
> > So I don't have the answer right now.
>
> I think now that "waiting for the end of a transaction listed in the
> xl_running_xacts record" in the comment is about transaction removal from
> procarray. However, the COMMIT record can still be ahead of xl_running_xacts
> because RecordTransactionCommit() is called before
> ProcArrayEndTransaction().
>

I see your point. Due to this, once the xmin regresses based on
cluster-wide running_xact, some transaction could appear to be running
when it should have appeared as committed. However, still it is not
clear how it could lead to one update in the transaction as
successfully decoded and another one to be skipped. One theory could
be that before the second update, somehow invalidation happens and
when decoding tries to reload the catalog to decode second update, the
relation is not visible because xmin has regressed and the update is
somehow skipped. I can't see how it can happen in code but something
like that is happening. Assuming, the problematic case is something
like what I described, even than the fix of skipping cluster-wide
running xacts and instead LOG db-specific running_xacts to help
updating builder's xmin sounds inelegant and probably inefficient. For
example, I think such a dependency means we can never enable
db-specific snapshots on standby.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2026-05-12 08:02:44 Re: [PATCH] Fix column name escaping in postgres_fdw stats import
Previous Message Etsuro Fujita 2026-05-12 07:36:12 Re: [PATCH] Fix column name escaping in postgres_fdw stats import