From: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | stanhu(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, 9erthalion6(at)gmail(dot)com, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lastOverflowedXid does not handle transaction ID wraparound |
Date: | 2021-11-03 17:50:54 |
Message-ID: | CANbhV-F-EAPh+HwYEpRo3v6m6_O9AOmthM=+tUsXY==MG+nV4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 21 Oct 2021 at 05:01, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 20 Oct 2021 08:55:12 -0700, Stan Hu <stanhu(at)gmail(dot)com> wrote in
> > On Wed, Oct 20, 2021 at 4:00 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >
> > >
> > >
> > > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> > > написал(а):
> > > > I wonder what would be side
> > > > effects of clearing it when the snapshot is not suboverfloved anymore?
> > >
> > > I think we should just invalidate lastOverflowedXid on every
> > > XLOG_RUNNING_XACTS if subxid_overflow == false. I can't find a reason not
> > > to do so.
I believe that to be an incorrect fix, but so very nearly correct.
There is a documented race condition in the generation of a
XLOG_RUNNING_XACTS that means there could be a new overflow event
after the snapshot was taken but before it was logged.
> > On a replica, I think it's possible for lastOverflowedXid to be set even if
> > subxid_overflow is false on the primary and secondary (
> > https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1327)
> > I thought subxid_overflow only gets set if there are more than
> > PGPROC_MAX_CACHED_SUBXIDS (64) used in a given transaction.
> >
> > Should the replica be invalidating lastOverflowedXid if subxcnt goes to
> > zero in XLOG_RUNNING_XACTS? But if there's an outstanding snapshot with an
> > xmin that precedes lastOverflowedXid we might violate MVCC if we invalidate
> > this, so I wonder if we also need to check the snapshot with the lowest
> > xmin?
>
> lastOverflowedXid is the smallest subxid that possibly exists but
> possiblly not known to the standby. So if all top-level transactions
> older than lastOverflowedXid end, that means that all the
> subtransactions in doubt are known to have been ended.
Agreed
> XLOG_RUNNING_XACTS reports oldestRunningXid, which is the oldest
> running top-transaction. Standby expires xids in KnownAssignedXids
> array that precede to the oldestRunningXid. We are sure that all
> possiblly-overflown subtransactions are gone as well if the oldest xid
> is newer than the first overflowed subtransaction.
Agreed
> As a cross check, the following existing code in GetSnapshotData means
> that no overflow is not happening if the smallest xid in the known
> assigned list is larger than lastOverflowedXid, which agrees to the
> consideration above.
>
> procaray.c:2428
> > subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
> > xmax);
> >
> > if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
> > suboverflowed = true;
>
>
> If the discussion so far is correct, the following diff will fix the
> issue.
>
> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
> index bd3c7a47fe..19682b73ec 100644
> --- a/src/backend/storage/ipc/procarray.c
> +++ b/src/backend/storage/ipc/procarray.c
> @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
> {
> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> KnownAssignedXidsRemovePreceding(xid);
> + /*
> + * reset lastOverflowedXid if we know transactions that have been possiblly
> + * running are being gone.
> + */
> + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid))
> + procArray->lastOverflowedXid = InvalidTransactionId;
> LWLockRelease(ProcArrayLock);
> }
So I agree with this fix.
It is however, an undocumented modularity violation. I think that is
acceptable because of the ProcArrayLock traffic, but needs to have a
comment to explain this at the call to
ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset
lastOverflowedXid", as well as a comment on the
ExpireOldKnownAssignedTransactionIds() function.
--
Simon Riggs http://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2021-11-03 18:45:22 | Re: XTS cipher mode for cluster file encryption |
Previous Message | Jan Wieck | 2021-11-03 17:43:55 | Re: should we enable log_checkpoints out of the box? |