Re: out-of-order XID insertion in KnownAssignedXids

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: out-of-order XID insertion in KnownAssignedXids
Date: 2018-10-09 14:26:49
Message-ID: 1e1d56e2-b056-5fa1-bc89-a6bb3836a1c8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.10.2018 10:52, Michael Paquier wrote:
> On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:
>> +1. I am looking at how to make that possible.
> And so... Going through a bit of investigation the problem is that each
> 2PC transaction preparing finishes by putting the procarray in a state
> where there are two entries referring to the same transaction running as
> MarkAsPrepared adds an extra entry on the top of the one already
> existing. This is expected though per the comments in
> ProcArrayClearTransaction(), as it assumes that MyProc can be safely
> cleaned and it assumes that there is a duplicated entry. Then,
> GetRunningTransactionData() ignores the assumption that those duplicate
> entries can exist, which makes the snapshot data taken as incorrect,
> generating those incorrect XLOG_RUNNING_XACT records.
>
> The most simple fix I can think of first is to tweak the origin of the
> problem in GetRunningTransactionData() so as those duplicate XIDs are
> ignored if found as there has to be a state between the time
> MarkAsPrepared() inserted the 2PC entry in the procarray and the time
> ProcArrayClearTransaction() gets called.
> --
> Michael

Right now GetRunningTransactionData is used only once in Postgres code - in LogStandbySnapshot to log RUNNING_XACTS record.
So if this situation is not changed, there will be no difference whether to perform sort and exclude duplicates in LogStandbySnapsho
or GetRunningTransactionData.

But GetRunningTransactionData may be used in some other cases... For example I have used it in my snapfs Postgres extensions (fast file-level snapshots)
to check if there are no more active transactions in the system. In my case presence of duplicates is not important and ordering of XIDs is not needed.
But performance of GetRunningTransactionData is also not critical. But I can suspect that there can be cases when it is critical and doing unneeded qsort is not desired.
There may be thousands of active transactions and sorting their XIDs is notinstantaneous.

Also please take in account that in most cases XIDs in RUNNINGS_XACTS record are not processed at all (only if we perform recovery from backup and do not reach consistent point yet). This is why this bug was not detected before.

So I completely understand the argument that it is better to eliminate source of the problem (presence of duplicates in RUNNING_XACTS record)
but not sure that in this particular case it is really the best solution. If presence of duplicates is considered to be acceptable for procarray, why we
can not accept it for RUNNING_XACTS record?

I am not insisting that skipping duplicates in
ProcArrayApplyRecoveryInfo is the right place (but at least is seems to
be the most efficient alternative). But I also do not fill that moving
sort to GetRunningTransactionData and elimination of duplicates here
(requires one more scan and copying of the whole array) is principally
better...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-10-09 14:54:44 Re: TAP tests for pg_verify_checksums
Previous Message Tom Lane 2018-10-09 14:22:08 Re: Support custom socket directory in pg_upgrade