Re: Transactions involving multiple postgres foreign servers, take 2

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "ikedamsh(at)oss(dot)nttdata(dot)com" <ikedamsh(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-01-27 01:29:52
Message-ID: 8f17d1fd-5160-75ad-ce85-24ae498e7a9c@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/01/18 14:54, Masahiko Sawada wrote:
> On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>>
>> For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :
>>
>> + entry->changing_xact_state = true;
>> ...
>> + entry->changing_xact_state = abort_cleanup_failure;
>>
>> I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, and later being assigned again.
>
> Because postgresRollbackForeignTransaction() can get called again in
> case where an error occurred during aborting and cleanup the
> transaction. For example, if an error occurred when executing ABORT
> TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
> postgresRollbackForeignTransaction() will get called again while
> entry->changing_xact_state is still true. Then the entry will be
> caught by the following condition and cleaned up:
>
> /*
> * If connection is before starting transaction or is already unsalvageable,
> * do only the cleanup and don't touch it further.
> */
> if (entry->changing_xact_state)
> {
> pgfdw_cleanup_after_transaction(entry);
> return;
> }
>
>>
>> For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :
>>
>> bq. This commits introduces to new background processes: foreign
>>
>> commits introduces to new -> commit introduces two new
>
> Fixed.
>
>>
>> +FdwXactExistsXid(TransactionId xid)
>>
>> Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name.
>
> But there is already a function named FdwXactExists()?
>
> bool
> FdwXactExists(Oid dbid, Oid serverid, Oid userid)
>
> As far as I read other code, we already have such functions that have
> the same functionality but have different arguments. For instance,
> SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
> we can leave as it is but is it better to have like
> FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?
>
>>
>> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
>>
>> Please correct year in the next patch set.
>
> Fixed.
>
>>
>> +FdwXactLauncherRequestToLaunch(void)
>>
>> Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name.
>
> Agreed. How about FdwXactRequestToLaunchResolver()?
>
>>
>> +/* Report shared memory space needed by FdwXactRsoverShmemInit */
>> +Size
>> +FdwXactRslvShmemSize(void)
>>
>> Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion.
>> Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit)
>
> Agreed. I realized that these functions are the launcher's function,
> not resolver's. So I'd change to FdwXactLauncherShmemSize() and
> FdwXactLauncherShmemInit() respectively.
>
>>
>> +fdwxact_launch_resolver(Oid dbid)
>>
>> The above method is not in camel case. It would be better if method names are consistent (in casing).
>
> Fixed.
>
>>
>> + errmsg("out of foreign transaction resolver slots"),
>> + errhint("You might need to increase max_foreign_transaction_resolvers.")));
>>
>> It would be nice to include the value of max_foreign_xact_resolvers
>
> I agree it would be nice but looking at other code we don't include
> the value in this kind of messages.
>
>>
>> For fdwxact_resolver_onexit():
>>
>> + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
>> + fdwxact->locking_backend = InvalidBackendId;
>> + LWLockRelease(FdwXactLock);
>>
>> There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the for loop and released coming out of the for loop.
>
> Agreed.
>
>>
>> +FXRslvLoop(void)
>>
>> Please use Resolver instead of Rslv
>
> Fixed.
>
>>
>> + FdwXactResolveFdwXacts(held_fdwxacts, nheld);
>>
>> Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter.
>
> Fixed.

You fixed some issues. But maybe you forgot to attach the latest patches?

I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature. If there are such changes, IMO we can apply them in advance, and which would make the patches simpler.

+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR, (errmsg("could not commit transaction on server %s",
+ frstate->server->servername)));

You changed the code this way because you want to include the server name in the error message? I agree that it's helpful to report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error() for the returned PGresult, the reported error message contains less information. If this understanding is right, I don't think that this change is an improvement.

Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it also reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when other commands fail.

Of course, if this change is not essential, we can skip doing this in the first version.

- /*
- * Regardless of the event type, we can now mark ourselves as out of the
- * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
- * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
- */
- xact_got_connection = false;

With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() using xact_got_connection?

+ /* Also reset cursor numbering for next transaction */
+ cursor_number = 0;

Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when a foreign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not right theoretically.

This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once per transaction end.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-27 01:40:39 Re: wiki:PostgreSQL_14_Open_Items
Previous Message Bharath Rupireddy 2021-01-27 01:17:15 Re: logical replication worker accesses catalogs in error context callback