Re: pg_prepared_xact_status

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_prepared_xact_status
Date: 2017-09-29 07:57:28
Message-ID: 424501b5-4da5-9b56-04d1-54aa419b4eff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.09.2017 06:02, Michael Paquier wrote:
> On Fri, Sep 29, 2017 at 1:53 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> In Postgres 10 we have txid_status function which returns status of
>> transaction by XID.
>> I wonder if it will be also useful to have similar function for 2PC
>> transactions which can operate with GID?
>> pg_prepared_xacts view allows to get information about prepared transaction
>> which are not yet committed or aborted.
>> But if transaction is committed, then there is no way now to find status of
>> this transaction.
> But you need to keep track of the transaction XID of each transaction
> happening on the remote nodes which are part of a global 2PC
> transaction, no?

Why? We have GID which allows to identify 2PC transaction at all
participant nodes.

> If you have this data at hand using txid_status is
> enough to guess if a prepared transaction has been marked as committed
> or prepared. And it seems to me that tracking those XIDs is mandatory
> anyway for other consistency checks.

It is certainly possible to maintain information about XIDs involved in
2PC transaction.
And it can really simplify recovery. But I wonder why it is mandatory?
Keeping track of XIDs requires some persistent storage.
So you are saying that Postgresql 2PC mechanism is not complete and user
needs to maintain some extra information to make it work?

Also, I think that it is not necessary to know XIDs of all local
transactions involved in 2PC. It is enough to know XID of coordinator's
transaction.
It can be included in GID (as I proposed in the end of my mail). In this
case txid_status can be used at coordinator to check global status of
2PC transaction.

The idea of pg_prepared_xact_status function is that it allows to get
status of 2PC transaction without any additional requirements to GIDs
and any other additional information about participants of 2PC transaction.

>
>> If crash happen during 2PC commit, then transaction can be in prepared state
>> at some nodes and committed/aborted at other nodes.
> Handling inconsistencies here is a tricky problem, particularly if a
> given transaction is marked as both committed and aborted on many
> nodes.
How it can be?
Abort of transaction can happen only at prepare stage.
In this case coordinator should rollback transaction everywhere.
There should be no committed transactions in this case.

The following situations are possible:
1. Transaction is prepared at some nodes and information about it is not
available at other nodes. It means that crash happen at prepare state
and transaction was not able to
complete prepare at all nodes. It is safe to abort transaction in this case.
2. Transaction is prepared at some nodes and aborted at another nodes.
The same as 1 - we can safely abort transaction everywhere.
3. Transaction is prepared at all nodes. It means that coordinator was
crashed before sending commit message. It is safe to commit transaction
everywhere.
4. Transaction is prepared at some nodes and committed at other nodes.
Commit message was no delivered or proceeded by other nodes before crash.
It is safe to commit transaction at all nodes.

The problems with 2PC arrive when coordinator node is not available but
is expected to be recovered in future.
In this case we may have not enough information to make a decision
whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol
to prevent such situation.

> The only way that I could think of would be to perform PITR to
> recover from the inconsistent states. So that's not an easy problem,
> becoming even more tricky if more than one transaction is involved and
> many transactions are inter-dependent across nodes.
>
>> 3. Same GID can be reused multiple times. In this case
>> pg_prepared_xact_status function will return incorrect result, because it
>> will return information about first global transaction with such GID after
>> checkpoint and not the recent one.
> Yeah, this argument alone is why I think that this is a dead-end approach.

May be. But I think that in most real systems unique GIDs are generated,
because otherwise it is difficult to address concurrency and recovery
issues.

>
>> There is actually alternative approach to recovery of 2PC transactions. We
>> can include coordinator identifier in GID (we can use GetSystemIdentifier()
>> to identify coordinator's node)
>> and XID of coordinator's transaction. In this case we can use txid_status()
>> to check status of transaction at coordinator. It eliminates need to scan
>> WAL to determine status of prepared transaction.
> + GetOldestRestartPoint(&lsn, &timeline);
> +
> + xlogreader = XLogReaderAllocate(&read_local_xlog_page, NULL);
> + if (!xlogreader)
> So you scan a bunch of records for each GID? This is really costly. I
> think that you would have an easier life by tracking the XID of each
> transaction involved remotely. In Postgres-XL, this is not a problem
> as XIDs are assigned globally and consistently. But you would gain in
> performance by keeping track of it on the coordinator node.

Yes, it can be costly.
But I just want to propose more or less universal mechanism which to
determine status of 2PC transaction based just on existed information in
WAL and not requiring some extra information stored in GID or in some
other storage.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2017-09-29 08:03:23 Re: Index expression syntax
Previous Message Konstantin Knizhnik 2017-09-29 07:31:29 Index expression syntax