Re: Global snapshots

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, movead(dot)li(at)highgo(dot)ca, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2020-07-08 12:34:54
Message-ID: CAA4eK1+tMDXU9WtQ0kRgyk1jPr+K-7FowMdFB8=t-ptUEQa-mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 8, 2020 at 11:16 AM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 7 Jul 2020 at 15:40, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > Okay, but isn't there some advantage with this approach (manage 2PC at
> > postgres_fdw level) as well which is that any node will be capable of
> > handling global transactions rather than doing them via central
> > coordinator? I mean any node can do writes or reads rather than
> > probably routing them (at least writes) via coordinator node.
>
> The postgres server where the client started the transaction works as
> the coordinator node. I think this is true for both this patch and
> that 2PC patch. From the perspective of atomic commit, any node will
> be capable of handling global transactions in both approaches.
>

Okay, but then probably we need to ensure that GID has to be unique
even if that gets generated on different nodes? I don't know if that
is ensured.

> > Now, I
> > agree that even if this advantage is there in the current approach, we
> > can't lose the crash-safety aspect of other approach. Will you be
> > able to summarize what was the problem w.r.t crash-safety and how your
> > patch has dealt it?
>
> Since this patch proceeds 2PC without any logging, foreign
> transactions prepared on foreign servers are left over without any
> clues if the coordinator crashes during commit. Therefore, after
> restart, the user will need to find and resolve in-doubt foreign
> transactions manually.
>

Okay, but is it because we can't directly WAL log in postgres_fdw or
there is some other reason for not doing so?

>
> >
> > > Looking at the commit procedure with this patch:
> > >
> > > When starting a new transaction on a foreign server, postgres_fdw
> > > executes pg_global_snapshot_import() to import the global snapshot.
> > > After some work, in pre-commit phase we do:
> > >
> > > 1. generate global transaction id, say 'gid'
> > > 2. execute PREPARE TRANSACTION 'gid' on all participants.
> > > 3. prepare global snapshot locally, if the local node also involves
> > > the transaction
> > > 4. execute pg_global_snapshot_prepare('gid') for all participants
> > >
> > > During step 2 to 4, we calculate the maximum CSN from the CSNs
> > > returned from each pg_global_snapshot_prepare() executions.
> > >
> > > 5. assign global snapshot locally, if the local node also involves the
> > > transaction
> > > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
> > >
> > > Then, we commit locally (i.g. mark the current transaction as
> > > committed in clog).
> > >
> > > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> > > participants.
> > >
> >
> > As per my current understanding, the overall idea is as follows. For
> > global transactions, pg_global_snapshot_prepare('gid') will set the
> > transaction status as InDoubt and generate CSN (let's call it NodeCSN)
> > at the node where that function is executed, it also returns the
> > NodeCSN to the coordinator. Then the coordinator (the current
> > postgres_fdw node on which write transaction is being executed)
> > computes MaxCSN based on the return value (NodeCSN) of prepare
> > (pg_global_snapshot_prepare) from all nodes. It then assigns MaxCSN
> > to each node. Finally, when Commit Prepared is issued for each node
> > that MaxCSN will be written to each node including the current node.
> > So, with this idea, each node will have the same view of CSN value
> > corresponding to any particular transaction.
> >
> > For Snapshot management, the node which receives the query generates a
> > CSN (CurrentCSN) and follows the simple rule that the tuple having a
> > xid with CSN lesser than CurrentCSN will be visible. Now, it is
> > possible that when we are examining a tuple, the CSN corresponding to
> > xid that has written the tuple has a value as INDOUBT which will
> > indicate that the transaction is yet not committed on all nodes. And
> > we wait till we get the valid CSN value corresponding to xid and then
> > use it to check if the tuple is visible.
> >
> > Now, one thing to note here is that for global transactions we
> > primarily rely on CSN value corresponding to a transaction for its
> > visibility even though we still maintain CLOG for local transaction
> > status.
> >
> > Leaving aside the incomplete parts and or flaws of the current patch,
> > does the above match the top-level idea of this patch?
>
> I'm still studying this patch but your understanding seems right to me.
>

Cool. While studying, if you can try to think whether this approach is
different from the global coordinator based approach then it would be
great. Here is my initial thought apart from other reasons the global
coordinator based design can help us to do the global transaction
management and snapshots. It can allocate xids for each transaction
and then collect the list of running xacts (or CSN) from each node and
then prepare a global snapshot that can be used to perform any
transaction.

OTOH, in the design proposed in this patch, we don't need any
coordinator to manage transactions and snapshots because each node's
current CSN will be sufficient for snapshot and visibility as
explained above. Now, sure this assumes that there is no clock skew
on different nodes or somehow we take care of the same (Note that in
the proposed patch the CSN is a timestamp.).

> > I am not sure
> > if my understanding of this patch at this stage is completely correct
> > or whether we want to follow the approach of this patch but I think at
> > least lets first be sure if such a top-level idea can achieve what we
> > want to do here.
> >
> > > Considering how to integrate this global snapshot feature with the 2PC
> > > patch, what the 2PC patch needs to at least change is to allow FDW to
> > > store an FDW-private data that is passed to subsequent FDW transaction
> > > API calls. Currently, in the current 2PC patch, we call Prepare API
> > > for each participant servers one by one, and the core pass only
> > > metadata such as ForeignServer, UserMapping, and global transaction
> > > identifier. So it's not easy to calculate the maximum CSN across
> > > multiple transaction API calls. I think we can change the 2PC patch to
> > > add a void pointer into FdwXactRslvState, struct passed from the core,
> > > in order to store FDW-private data. It's going to be the maximum CSN
> > > in this case. That way, at the first Prepare API calls postgres_fdw
> > > allocates the space and stores CSN to that space. And at subsequent
> > > Prepare API calls it can calculate the maximum of csn, and then is
> > > able to the step 3 to 6 when preparing the transaction on the last
> > > participant. Another idea would be to change 2PC patch so that the
> > > core passes a bunch of participants grouped by FDW.
> > >
> >
> > IIUC with this the coordinator needs the communication with the nodes
> > twice at the prepare stage, once to prepare the transaction in each
> > node and get CSN from each node and then to communicate MaxCSN to each
> > node?
>
> Yes, I think so too.
>
> > Also, we probably need InDoubt CSN status at prepare phase to
> > make snapshots and global visibility work.
>
> I think it depends on how global CSN feature works.
>
> For instance, in that 2PC patch, if the coordinator crashes during
> preparing a foreign transaction, the global transaction manager
> recovers and regards it as "prepared" regardless of the foreign
> transaction actually having been prepared. And it sends ROLLBACK
> PREPARED after recovery completed. With global CSN patch, as you
> mentioned, at prepare phase the coordinator needs to communicate
> participants twice other than sending PREPARE TRANSACTION:
> pg_global_snapshot_prepare() and pg_global_snapshot_assign().
>
> If global CSN patch needs different cleanup work depending on the CSN
> status, we will need InDoubt CSN status so that the global transaction
> manager can distinguish between a foreign transaction that has
> executed pg_global_snapshot_prepare() and the one that has executed
> pg_global_snapshot_assign().
>
> On the other hand, if it's enough to just send ROLLBACK or ROLLBACK
> PREPARED in that case, I think we don't need InDoubt CSN status. There
> is no difference between those foreign transactions from the global
> transaction manager perspective.
>

I think InDoubt status helps in checking visibility in the proposed
patch wherein if we find the status of the transaction as InDoubt, we
wait till we get some valid CSN for it as explained in my previous
email. So whether we use it for Rollback/Rollback Prepared, it is
required for this design.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-07-08 12:39:54 Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Previous Message Fujii Masao 2020-07-08 12:27:04 Re: Modifying data type of slot_keep_segs from XLogRecPtr to XLogSegNo