Re: Global snapshots

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, movead(dot)li(at)highgo(dot)ca, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2020-07-07 06:40:20
Message-ID: CAA4eK1JVsMWUD4q-b+vawehFzJb6Qg0AOGq7qOGL_gm6EEhdJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Sat, 20 Jun 2020 at 21:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >
> > > Also, can you let us know if this
> > > > supports 2PC in some way and if so how is it different from what the
> > > > other thread on the same topic [1] is trying to achieve?
> > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > > 2PC machinery. Now I'd not judge which approach is better.
> > >
> >
>
> Sorry for being late.
>

No problem, your summarization, and comparisons of both approaches are
quite helpful.

>
> I studied this patch and did a simple comparison between this patch
> (0002 patch) and my 2PC patch.
>
> In terms of atomic commit, the features that are not implemented in
> this patch but in the 2PC patch are:
>
> * Crash safe.
> * PREPARE TRANSACTION command support.
> * Query cancel during waiting for the commit.
> * Automatically in-doubt transaction resolution.
>
> On the other hand, the feature that is implemented in this patch but
> not in the 2PC patch is:
>
> * Executing PREPARE TRANSACTION (and other commands) in parallel
>
> When the 2PC patch was proposed, IIRC it was like this patch (0002
> patch). I mean, it changed only postgres_fdw to support 2PC. But after
> discussion, we changed the approach to have the core manage foreign
> transaction for crash-safe. From my perspective, this patch has a
> minimum implementation of 2PC to work the global snapshot feature and
> has some missing features important for supporting crash-safe atomic
> commit. So I personally think we should consider how to integrate this
> global snapshot feature with the 2PC patch, rather than improving this
> patch if we want crash-safe atomic commit.
>

Okay, but isn't there some advantage with this approach (manage 2PC at
postgres_fdw level) as well which is that any node will be capable of
handling global transactions rather than doing them via central
coordinator? I mean any node can do writes or reads rather than
probably routing them (at least writes) via coordinator node. Now, I
agree that even if this advantage is there in the current approach, we
can't lose the crash-safety aspect of other approach. Will you be
able to summarize what was the problem w.r.t crash-safety and how your
patch has dealt it?

> Looking at the commit procedure with this patch:
>
> When starting a new transaction on a foreign server, postgres_fdw
> executes pg_global_snapshot_import() to import the global snapshot.
> After some work, in pre-commit phase we do:
>
> 1. generate global transaction id, say 'gid'
> 2. execute PREPARE TRANSACTION 'gid' on all participants.
> 3. prepare global snapshot locally, if the local node also involves
> the transaction
> 4. execute pg_global_snapshot_prepare('gid') for all participants
>
> During step 2 to 4, we calculate the maximum CSN from the CSNs
> returned from each pg_global_snapshot_prepare() executions.
>
> 5. assign global snapshot locally, if the local node also involves the
> transaction
> 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
>
> Then, we commit locally (i.g. mark the current transaction as
> committed in clog).
>
> After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> participants.
>

As per my current understanding, the overall idea is as follows. For
global transactions, pg_global_snapshot_prepare('gid') will set the
transaction status as InDoubt and generate CSN (let's call it NodeCSN)
at the node where that function is executed, it also returns the
NodeCSN to the coordinator. Then the coordinator (the current
postgres_fdw node on which write transaction is being executed)
computes MaxCSN based on the return value (NodeCSN) of prepare
(pg_global_snapshot_prepare) from all nodes. It then assigns MaxCSN
to each node. Finally, when Commit Prepared is issued for each node
that MaxCSN will be written to each node including the current node.
So, with this idea, each node will have the same view of CSN value
corresponding to any particular transaction.

For Snapshot management, the node which receives the query generates a
CSN (CurrentCSN) and follows the simple rule that the tuple having a
xid with CSN lesser than CurrentCSN will be visible. Now, it is
possible that when we are examining a tuple, the CSN corresponding to
xid that has written the tuple has a value as INDOUBT which will
indicate that the transaction is yet not committed on all nodes. And
we wait till we get the valid CSN value corresponding to xid and then
use it to check if the tuple is visible.

Now, one thing to note here is that for global transactions we
primarily rely on CSN value corresponding to a transaction for its
visibility even though we still maintain CLOG for local transaction
status.

Leaving aside the incomplete parts and or flaws of the current patch,
does the above match the top-level idea of this patch? I am not sure
if my understanding of this patch at this stage is completely correct
or whether we want to follow the approach of this patch but I think at
least lets first be sure if such a top-level idea can achieve what we
want to do here.

> Considering how to integrate this global snapshot feature with the 2PC
> patch, what the 2PC patch needs to at least change is to allow FDW to
> store an FDW-private data that is passed to subsequent FDW transaction
> API calls. Currently, in the current 2PC patch, we call Prepare API
> for each participant servers one by one, and the core pass only
> metadata such as ForeignServer, UserMapping, and global transaction
> identifier. So it's not easy to calculate the maximum CSN across
> multiple transaction API calls. I think we can change the 2PC patch to
> add a void pointer into FdwXactRslvState, struct passed from the core,
> in order to store FDW-private data. It's going to be the maximum CSN
> in this case. That way, at the first Prepare API calls postgres_fdw
> allocates the space and stores CSN to that space. And at subsequent
> Prepare API calls it can calculate the maximum of csn, and then is
> able to the step 3 to 6 when preparing the transaction on the last
> participant. Another idea would be to change 2PC patch so that the
> core passes a bunch of participants grouped by FDW.
>

IIUC with this the coordinator needs the communication with the nodes
twice at the prepare stage, once to prepare the transaction in each
node and get CSN from each node and then to communicate MaxCSN to each
node? Also, we probably need InDoubt CSN status at prepare phase to
make snapshots and global visibility work.

> I’ve not read this patch deeply yet and have considered it without any
> coding but my first feeling is not hard to integrate this feature with
> the 2PC patch.
>

Okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-07-07 06:41:41 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Bharath Rupireddy 2020-07-07 06:35:31 Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks