Re: out-of-order XID insertion in KnownAssignedXids

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: out-of-order XID insertion in KnownAssignedXids
Date: 2018-10-08 15:28:52
Message-ID: fc51532c-dbf5-dce0-b31c-82c7a0b837ed@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08.10.2018 18:24, Andres Freund wrote:
>
> On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>
>> On 05.10.2018 11:04, Michael Paquier wrote:
>>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>>>> As you can notice, XID 2004495308 is encountered twice which cause
>> error in
>>>> KnownAssignedXidsAdd:
>>>>
>>>>     if (head > tail &&
>>>>         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
>> from_xid))
>>>>     {
>>>>         KnownAssignedXidsDisplay(LOG);
>>>>         elog(ERROR, "out-of-order XID insertion in
>> KnownAssignedXids");
>>>>     }
>>>>
>>>> The probability of this error is very small but it can quite easily
>>>> reproduced: you should just set breakpoint in debugger after calling
>>>> MarkAsPrepared in twophase.c and then try to prepare any
>> transaction.
>>>> MarkAsPrepared  will add GXACT to proc array and at this moment
>> there will
>>>> be two entries in procarray with the same XID:
>>>>
>>>> [snip]
>>>>
>>>> Now generated RUNNING_XACTS record contains duplicated XIDs.
>>> So, I have been doing exactly that, and if you trigger a manual
>>> checkpoint then things happen quite correctly if you let the first
>>> session finish:
>>> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
>>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
>>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>>>
>>> If you still maintain the debugger after calling MarkAsPrepared, then
>>> the manual checkpoint would block. Now if you actually keep the
>>> debugger, and wait for a checkpoint timeout to happen, then I can see
>>> the incorrect record. It is impressive that your customer has been
>> able
>>> to see that first, and then that you have been able to get into that
>>> state with simple steps.
>>>
>>>> I want to ask opinion of community about the best way of fixing this
>>>> problem. Should we avoid storing duplicated XIDs in procarray (by
>>>> invalidating XID in original pgaxct) or eliminate/change check for
>>>> duplicate in KnownAssignedXidsAdd (for example just ignore
>>>> duplicates)?
>>> Hmmmmm... Please let me think through that first. It seems to me
>> that
>>> the record should not be generated to begin with. At least I am able
>> to
>>> confirm what you see.
>> The simplest way to fix the problem is to ignore duplicates before
>> adding them to KnownAssignedXids.
>> We in any case perform sort i this place...
> I vehemently object to that as the proper course.
And what about adding qsort to GetRunningTransactionData or
LogCurrentRunningXacts and excluding duplicates here?

> Andres

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-08 15:38:44 Re: executor relation handling
Previous Message Pavel Stehule 2018-10-08 15:28:05 Re: PostgreSQL 12, JIT defaults