Re: Global snapshots

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, movead(dot)li(at)highgo(dot)ca, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2020-07-03 06:48:16
Message-ID: CA+fd4k6oZtO-MFYmunHVecGaTWre8YKDNTSfX9hZhQh6Kui1kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 20 Jun 2020 at 21:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >
> > On 6/19/20 11:48 AM, Amit Kapila wrote:
> > > On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov
> > > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> > >> On 09.06.2020 11:41, Fujii Masao wrote:
> > >>> The patches seem not to be registered in CommitFest yet.
> > >>> Are you planning to do that?
> > >> Not now. It is a sharding-related feature. I'm not sure that this
> > >> approach is fully consistent with the sharding way now.
> > > Can you please explain in detail, why you think so? There is no
> > > commit message explaining what each patch does so it is difficult to
> > > understand why you said so?
> > For now I used this patch set for providing correct visibility in the
> > case of access to the table with foreign partitions from many nodes in
> > parallel. So I saw at this patch set as a sharding-related feature, but
> > [1] shows another useful application.
> > CSN-based approach has weak points such as:
> > 1. Dependency on clocks synchronization
> > 2. Needs guarantees of monotonically increasing of the CSN in the case
> > of an instance restart/crash etc.
> > 3. We need to delay increasing of OldestXmin because it can be needed
> > for a transaction snapshot at another node.
> >
>
> So, is anyone working on improving these parts of the patch. AFAICS
> from what Bruce has shared [1], some people from HighGo are working on
> it but I don't see any discussion of that yet.
>
> > So I do not have full conviction that it will be better than a single
> > distributed transaction manager.
> >
>
> When you say "single distributed transaction manager" do you mean
> something like pg_dtm which is inspired by Postgres-XL?
>
> > Also, can you let us know if this
> > > supports 2PC in some way and if so how is it different from what the
> > > other thread on the same topic [1] is trying to achieve?
> > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > 2PC machinery. Now I'd not judge which approach is better.
> >
>

Sorry for being late.

> Yeah, I have studied both the approaches a little and I feel the main
> difference seems to be that in this patch atomicity is tightly coupled
> with how we achieve global visibility, basically in this patch "all
> running transactions are marked as InDoubt on all nodes in prepare
> phase, and after that, each node commit it and stamps each xid with a
> given GlobalCSN.". There are no separate APIs for
> prepare/commit/rollback exposed by postgres_fdw as we do it in the
> approach followed by Sawada-San's patch. It seems to me in the patch
> in this email one of postgres_fdw node can be a sort of coordinator
> which prepares and commit the transaction on all other nodes whereas
> that is not true in Sawada-San's patch (where the coordinator is a
> local Postgres node, am I right Sawada-San?).

Yeah, where to manage foreign transactions is different: postgres_fdw
manages foreign transactions in this patch whereas the PostgreSQL core
does that in that 2PC patch.

>
> I feel if Sawada-San or someone involved in another patch also once
> studies this approach and try to come up with some form of comparison
> then we might be able to make better decision. It is possible that
> there are few good things in each approach which we can use.
>

I studied this patch and did a simple comparison between this patch
(0002 patch) and my 2PC patch.

In terms of atomic commit, the features that are not implemented in
this patch but in the 2PC patch are:

* Crash safe.
* PREPARE TRANSACTION command support.
* Query cancel during waiting for the commit.
* Automatically in-doubt transaction resolution.

On the other hand, the feature that is implemented in this patch but
not in the 2PC patch is:

* Executing PREPARE TRANSACTION (and other commands) in parallel

When the 2PC patch was proposed, IIRC it was like this patch (0002
patch). I mean, it changed only postgres_fdw to support 2PC. But after
discussion, we changed the approach to have the core manage foreign
transaction for crash-safe. From my perspective, this patch has a
minimum implementation of 2PC to work the global snapshot feature and
has some missing features important for supporting crash-safe atomic
commit. So I personally think we should consider how to integrate this
global snapshot feature with the 2PC patch, rather than improving this
patch if we want crash-safe atomic commit.

Looking at the commit procedure with this patch:

When starting a new transaction on a foreign server, postgres_fdw
executes pg_global_snapshot_import() to import the global snapshot.
After some work, in pre-commit phase we do:

1. generate global transaction id, say 'gid'
2. execute PREPARE TRANSACTION 'gid' on all participants.
3. prepare global snapshot locally, if the local node also involves
the transaction
4. execute pg_global_snapshot_prepare('gid') for all participants

During step 2 to 4, we calculate the maximum CSN from the CSNs
returned from each pg_global_snapshot_prepare() executions.

5. assign global snapshot locally, if the local node also involves the
transaction
6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.

Then, we commit locally (i.g. mark the current transaction as
committed in clog).

After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
participants.

Considering how to integrate this global snapshot feature with the 2PC
patch, what the 2PC patch needs to at least change is to allow FDW to
store an FDW-private data that is passed to subsequent FDW transaction
API calls. Currently, in the current 2PC patch, we call Prepare API
for each participant servers one by one, and the core pass only
metadata such as ForeignServer, UserMapping, and global transaction
identifier. So it's not easy to calculate the maximum CSN across
multiple transaction API calls. I think we can change the 2PC patch to
add a void pointer into FdwXactRslvState, struct passed from the core,
in order to store FDW-private data. It's going to be the maximum CSN
in this case. That way, at the first Prepare API calls postgres_fdw
allocates the space and stores CSN to that space. And at subsequent
Prepare API calls it can calculate the maximum of csn, and then is
able to the step 3 to 6 when preparing the transaction on the last
participant. Another idea would be to change 2PC patch so that the
core passes a bunch of participants grouped by FDW.

I’ve not read this patch deeply yet and have considered it without any
coding but my first feeling is not hard to integrate this feature with
the 2PC patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-07-03 06:57:38 Re: track_planning causing performance regression
Previous Message Amit Khandekar 2020-07-03 06:41:23 Re: Inlining of couple of functions in pl_exec.c improves performance