Re: Global snapshots

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, tsunakawa(dot)takay(at)fujitsu(dot)com, movead(dot)li(at)highgo(dot)ca, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2020-09-08 17:00:44
Message-ID: 53b1586317ae98ecd8c3383f2c9e7c16@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-09-08 14:48, Fujii Masao wrote:
> On 2020/09/08 19:36, Alexey Kondratov wrote:
>> On 2020-09-08 05:49, Fujii Masao wrote:
>>> On 2020/09/05 3:31, Alexey Kondratov wrote:
>>>>
>>>> Attached is a patch, which implements a plain 2PC in the
>>>> postgres_fdw and adds a GUC 'postgres_fdw.use_twophase'. Also it
>>>> solves these errors handling issues above and tries to add proper
>>>> comments everywhere. I think, that 0003 should be rebased on the top
>>>> of it, or it could be a first patch in the set, since it may be used
>>>> independently. What do you think?
>>>
>>> Thanks for the patch!
>>>
>>> Sawada-san was proposing another 2PC patch at [1]. Do you have any
>>> thoughts
>>> about pros and cons between your patch and Sawada-san's?
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/CA+fd4k4z6_B1ETEvQamwQhu4RX7XsrN5ORL7OhJ4B5B6sW-RgQ@mail.gmail.com
>>
>> Thank you for the link!
>>
>> After a quick look on the Sawada-san's patch set I think that there
>> are two major differences:
>
> Thanks for sharing your thought! As far as I read your patch quickly,
> I basically agree with your this view.
>
>
>>
>> 1. There is a built-in foreign xacts resolver in the [1], which should
>> be much more convenient from the end-user perspective. It involves
>> huge in-core changes and additional complexity that is of course worth
>> of.
>>
>> However, it's still not clear for me that it is possible to resolve
>> all foreign prepared xacts on the Postgres' own side with a 100%
>> guarantee. Imagine a situation when the coordinator node is actually a
>> HA cluster group (primary + sync + async replica) and it failed just
>> after PREPARE stage of after local COMMIT. In that case all foreign
>> xacts will be left in the prepared state. After failover process
>> complete synchronous replica will become a new primary. Would it have
>> all required info to properly resolve orphan prepared xacts?
>
> IIUC, yes, the information required for automatic resolution is
> WAL-logged and the standby tries to resolve those orphan transactions
> from WAL after the failover. But Sawada-san's patch provides
> the special function for manual resolution, so there may be some cases
> where manual resolution is necessary.
>

I've found a note about manual resolution in the v25 0002:

+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change
to
+rollback, therefore at this time some participants might be prepared
whereas
+some are not prepared. The former foreign transactions need to be
resolved
+using pg_resolve_foreign_xact() manually and the latter ends
transaction
+in one-phase by calling RollbackForeignTransaction() API.

but it's not yet clear for me.

>
> Implementing 2PC feature only inside postgres_fdw seems to cause
> another issue; COMMIT PREPARED is issued to the remote servers
> after marking the local transaction as committed
> (i.e., ProcArrayEndTransaction()).
>

According to the Sawada-san's v25 0002 the logic is pretty much the same
there:

+2. Pre-Commit phase (1st phase of two-phase commit)

+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.

+4. Post-Commit Phase (2nd phase of two-phase commit)

Brief look at the code confirms this scheme. IIUC, AtEOXact_FdwXact /
FdwXactParticipantEndTransaction happens after ProcArrayEndTransaction()
in the CommitTransaction(). Thus, I don't see many difference between
these approach and CallXactCallbacks() usage regarding this point.

> Is this safe? This issue happens
> because COMMIT PREPARED is issued via
> CallXactCallbacks(XACT_EVENT_COMMIT) and that CallXactCallbacks()
> is called after ProcArrayEndTransaction().
>

Once the transaction is committed locally any ERROR (or higher level
message) will be escalated to PANIC. And I do see possible ERROR level
messages in the postgresCommitForeignTransaction() for example:

+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR, (errmsg("could not commit transaction on server %s",
+ frstate->server->servername)));

I don't think that it's very convenient to get a PANIC every time we
failed to commit one of the prepared foreign xacts, since it could be
not so rare in the distributed system. That's why I tried to get rid of
possible ERRORs as far as possible in my proposed patch.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-09-08 17:06:17 Re: Logical Replication - detail message with names of missing columns
Previous Message David Rowley 2020-09-08 16:30:15 Re: Optimising compactify_tuples()