Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Ildar Musin <ildar(at)adjust(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2
Date: 2019-04-17 10:17:35
Message-ID: CAD21AoDn98axH1bEoMnte+S7WWR=nsmOpjz1WGH-NvJi4aLu3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 31, 2019 at 7:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar(at)adjust(dot)com> wrote:
> >
> > Hello,
> >
> > The patch needs rebase as it doesn't apply to the current master. I applied it
> > to the older commit to test it. It worked fine so far.
>
> Thank you for testing the patch!
>
> >
> > I found one bug though which would cause resolver to finish by timeout even
> > though there are unresolved foreign transactions in the list. The
> > `fdw_xact_exists()` function expects database id as the first argument and xid
> > as the second. But everywhere it is called arguments specified in the different
> > order (xid first, then dbid). Also function declaration in header doesn't
> > match its definition.
>
> Will fix.
>
> >
> > There are some other things I found.
> > * In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
> > declared as bool but used as integer.
> > * In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()`
> > and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
> > not there anymore.
> > * In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
> > directory.
> >
> > Couple of stylistic notes.
> > * In `FdwXactCtlData struct` there are both camel case and snake case naming
> > used.
> > * In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
> > `TransactionIdIsValid(xid)`.
> > * In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format
> > string instead of being processed by `sprintf` as an extra argument.
> >
>
> I'll incorporate them at the next patch set.
>
> > I'll continue looking into the patch. Thanks!
>
> Thanks. Actually I'm updating the patch set, changing API interface as
> I proposed before and improving the document and README. I'll submit
> the latest patch next week.
>

Sorry for the very late. Attached updated version patches.

The basic mechanism has not been changed since the previous version.
But the updated version patch uses the single wait queue instead of
two queues (active and retry) which were used in the previous version.

Every backends processes has a timestamp in PGPROC
(fdwXactNextResolutionTs), that is the time when they expect to be
processed by foreign resolver process at. Entries in the wait queue is
ordered by theirs timestamps. The wait queue and timestamp are used
after a backend process prepared all transactions on foreign servers
and wait for all of them to be resolved.

Backend processes who are committing/aborting the distributed
transaction insert itself to the wait queue
(FdwXactRslvCtl->fdwxact_queue) with the current timestamp, and then
request to launch a new resolver process if not launched yet. If there
is resolver connecting to the same database they just set its latch.
The wait queue is protected by LWLock FdwXactResolutionLock. Then the
backend sleep until either user requests to cancel (press ctrl-c) or
waken up by resolver process.

Foreign resolver process continue to poll the wait queue, checking if
there is any waiter on the database that the resolver process connects
to. If there is a waiter, fetches it and check its timestamp. If the
current timestamp goes over its timestamp, the resolver process start
to resolve all foreign transactions. Usually backends processes insert
itself to wait queue first then wake up the resolver and they use the
same wall-clock, so the resolver can fetch the waiter just inserted.
Once all foreign transactions are resolved, the resolver process
delete the backend entry from the wait queue, and then wake up the
waiting backend.

On failure during foreign transaction resolution, while the backend is
still sleeping, the resolver process removes and inserts the backend
with the new timestamp (its timestamp
foreign_transaction_resolution_interval) to appropriate position in
the wait queue. This mechanism ensures that a distributed transaction
is resolved as soon as the waiter inserted while ensuring that the
resolver can retry to resolve the failed foreign transactions at a
interval of foreign_transaction_resolution_interval time.

For handling in-doubt transactions, I've removed the automatically
foreign transaction resolution code from the first version patch since
it's not essential feature and we can add it later. Therefore user
needs to resolve unresolved foreign transactions manually using by
pg_resolve_fdwxacts() function in three cases: where the foreign
server crashed or we lost connectibility to it during preparing
foreign transaction, where the coordinator node crashed during
preparing/resolving the foreign transaction and where user canceled to
resolve the foreign transaction.

For foreign transaction resolver processes, they exit if they don't
have any foreign transaction to resolve longer than
foreign_transaction_resolver_timeout. Since we cannot drop a database
while a resolver process is connecting to we can stop it call by
pg_stop_fdwxact_resolver() function.

The comment at top of fdwxact.c file describes about locking mechanism
and recovery, and src/backend/fdwxact/README descries about status
transition of FdwXact.

Also the wiki page[1] describes how to use this feature with some examples.

[1] https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
v23-0001-Keep-track-of-writing-on-non-temporary-relation.patch application/octet-stream 2.2 KB
v23-0004-Add-regression-tests-for-atomic-commit.patch application/octet-stream 8.1 KB
v23-0003-postgres_fdw-supports-atomic-commit-APIs.patch application/octet-stream 45.4 KB
v23-0002-Support-atomic-commit-among-multiple-foreign-ser.patch application/octet-stream 232.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-04-17 10:19:29 Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Previous Message Amit Langote 2019-04-17 10:00:30 Re: hyrax vs. RelationBuildPartitionDesc