Re: out-of-order XID insertion in KnownAssignedXids

From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: out-of-order XID insertion in KnownAssignedXids
Date: 2018-10-08 16:30:49
Message-ID: 20181008163049.nvaui5kjrsav2ojn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote:
>
>
> On 08.10.2018 18:24, Andres Freund wrote:
> >
> > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > >
> > > On 05.10.2018 11:04, Michael Paquier wrote:
> > > > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
> > > > > As you can notice, XID 2004495308 is encountered twice which cause
> > > error in
> > > > > KnownAssignedXidsAdd:
> > > > >
> > > > >     if (head > tail &&
> > > > >         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
> > > from_xid))
> > > > >     {
> > > > >         KnownAssignedXidsDisplay(LOG);
> > > > >         elog(ERROR, "out-of-order XID insertion in
> > > KnownAssignedXids");
> > > > >     }
> > > > >
> > > > > The probability of this error is very small but it can quite easily
> > > > > reproduced: you should just set breakpoint in debugger after calling
> > > > > MarkAsPrepared in twophase.c and then try to prepare any
> > > transaction.
> > > > > MarkAsPrepared  will add GXACT to proc array and at this moment
> > > there will
> > > > > be two entries in procarray with the same XID:
> > > > >
> > > > > [snip]
> > > > >
> > > > > Now generated RUNNING_XACTS record contains duplicated XIDs.
> > > > So, I have been doing exactly that, and if you trigger a manual
> > > > checkpoint then things happen quite correctly if you let the first
> > > > session finish:
> > > > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
> > > > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
> > > > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
> > > >
> > > > If you still maintain the debugger after calling MarkAsPrepared, then
> > > > the manual checkpoint would block. Now if you actually keep the
> > > > debugger, and wait for a checkpoint timeout to happen, then I can see
> > > > the incorrect record. It is impressive that your customer has been
> > > able
> > > > to see that first, and then that you have been able to get into that
> > > > state with simple steps.
> > > >
> > > > > I want to ask opinion of community about the best way of fixing this
> > > > > problem. Should we avoid storing duplicated XIDs in procarray (by
> > > > > invalidating XID in original pgaxct) or eliminate/change check for
> > > > > duplicate in KnownAssignedXidsAdd (for example just ignore
> > > > > duplicates)?
> > > > Hmmmmm... Please let me think through that first. It seems to me
> > > that
> > > > the record should not be generated to begin with. At least I am able
> > > to
> > > > confirm what you see.
> > > The simplest way to fix the problem is to ignore duplicates before
> > > adding them to KnownAssignedXids.
> > > We in any case perform sort i this place...
> > I vehemently object to that as the proper course.
> And what about adding qsort to GetRunningTransactionData or
> LogCurrentRunningXacts and excluding duplicates here?

Sounds less terrible, but still pretty bad. I think we should fix the
underlying data inconsistency, not paper over it a couple hundred meters
away.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-08 17:14:34 Re: transction_timestamp() inside of procedures
Previous Message Andrew Dunstan 2018-10-08 16:09:29 Re: pg_dumpall --exclude-database option