Re: [HACKERS] logical decoding of two-phase transactions

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-04-02 07:49:24
Message-ID: CAMGcDxeUFoGM-nVent3qiOaYKr3KdxRfL=BDhS0vpjM-oLha_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

> Thanks. I think the README is a good start, but I think we also need to
> improve the comments, which is usually more detailed than the README.
> For example, it's not quite acceptable that LogicalLockTransaction and
> LogicalUnlockTransaction have about no comments, especially when it's
> meant to be public API for decoding plugins.
>

Additional documents around the APIs incorporated from your review patch.

>
>>>
>>> 2) regression tests
>>> -------------------
>> They are long-winded queries and IMO made the test file look too
>> cluttered and verbose..
>>
>
> Well, I don't think that's a major problem, and it certainly makes it
> more difficult to investigate regression failures.
>

Changed the test files to use the actual queries everywhere now.

> Now, the new bits ... attached is a .diff with a couple of changes and
> comments on various places.
>
> 1) LogicalLockTransaction
>
> - This function is part of a public API, yet it has no comment. That
> needs fixing - it has to be clear how to use it. The .diff suggests a
> comment, but it may need improvements.
>

Done.

>
> - As I mentioned in the previous review, BecomeDecodeGroupLeader is a
> misleading name. It suggest the called becomes a leader, while in fact
> it looks up the PROC running the XID and makes it a leader. This is
> obviously due to copying the code from lock groups, where the caller
> actually becomes the leader. It's incorrect here. I suggest something
> like LookupDecodeGroupLeader() or something.
>

Done. Used AssignDecodeGroupLeader() as the function name now.

>
> - In the "if (MyProc->decodeGroupLeader == NULL)" block there are two
> blocks rechecking the transaction status:
>
> if (proc == NULL)
> { ... recheck ... }
>
> if (!BecomeDecodeGroupMember(proc, proc->pid, rbtxn_prepared(txn)))
> { ... recheck ...}
>
> I suggest to join them into a single block.
>

Done. Combined into a single block.

>
> - This Assert() is either bogus and there can indeed be cases with
> (MyProc->decodeGroupLeader==NULL), or the "if" is unnecessary:
>
> Assert(MyProc->decodeGroupLeader);
>
> if (MyProc->decodeGroupLeader) { ... }
>

Done. Removed the assert now.

> - I'm wondering why we're maintaining decodeAbortPending flags both for
> the leader and all the members. ISTM it'd be perfectly fine to only
> check the leader, particularly because RemoveDecodeGroupMemberLocked
> removes the members from the decoding group. So that seems unnecessary,
> and we can remove the
>
> if (MyProc->decodeAbortPending)
> { ... }
>

IMO, this looked clearer that each proc has been notified that an
abort is pending.

> - LogicalUnlockTransaction needs a comment(s) too.
>

Done.

>
> 2) BecomeDecodeGroupLeader
>
> - It can bail out when (!proc), which will simplify the code a bit.
>

Done.

> - Why does it check PID of the process at all? Seems unnecessary,
> considering we're already checking the XID.
>

Agreed. Especially for the current case of 2PC, the proc will have 0 as pid.

> - Can a proc executing a XID have a different leader? I don't think so,
> so I'd make that an Assert().
>
> Assert(!proc || (proc->decodeGroupLeader == proc));
>
> And it'll allow simplification of some of the conditions.
>

Done.

> - We're only dealing with prepared transactions now, so I'd just drop
> the is_prepared flag - it'll make the code a bit simpler, we can add it
> later in patch adding decoding of regular in-progress transactions. We
> can't test the (!is_prepared) anyway.
>

Done.

> - Why are we making the leader also a member of the group? Seems rather
> unnecessary, and it complicates the abort handling, because we need to
> skip the leader when deciding to wait.
>

The leader is part of the decode group. And other than not waiting for ourself
at abort time, no other coding complications are there AFAICS.

>
> 3) LogicalDecodeRemoveTransaction
>
> - It's not clear to me what happens when a decoding backend gets killed
> between LogicalLockTransaction/LogicalUnlockTransaction. Doesn't that
> mean LogicalDecodeRemoveTransaction will get stuck, because the proc is
> still in the decoding group?
>

SIGSEGV, SIGABRT, SIGKILL will all cause the PG instance to restart because of
possible shmem corruption issues. So I don't think the above scenario
will arise. I also did not see any related handling in the parallel
lock group case as well.

>
> 4) a bunch of comment / docs improvements, ...
>
> I'm suggesting rewording a couple of comments. I've also added a couple
> of missing comments - e.g. to LogicalLockTransaction and the lock group
> methods in general.
>
> Also, a couple more questions and suggestions in XXX comments.
>

Incorporated relevant changes in the new patchset.

Andres, Petr:

As discussed, I have now added lock/unlock API calls around the
"apply_change" callback. This callback is now free to consult catalog
metadata without worrying about a concurrent rollback operation. Have
removed direct logicallock/logicalunlock calls from inside the
pgoutput and test_decoding plugins now. Also modified the sgml
documentation appropriately.

Am looking at how we can further optimize this by looking at the two
approaches about signaling about abort or adding abort related info in
the context, but this will be an additional patch over this patch set
anyways.

Regards,
Nikhils

Attachment Content-Type Size
0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.0204.patch application/octet-stream 7.3 KB
0002-Introduce-LogicalLockTransaction-LogicalUnlockTransa.0204.patch application/octet-stream 28.5 KB
0003-Support-decoding-of-two-phase-transactions-at-PREPAR.0204.patch application/octet-stream 32.7 KB
0004-pgoutput-output-plugin-support-for-logical-decoding-.0204.patch application/octet-stream 33.2 KB
0005-Teach-test_decoding-plugin-to-work-with-2PC.0204.patch application/octet-stream 25.3 KB
0006-Optional-Additional-test-case-to-demonstrate-decoding-rollbac.0204.patch application/octet-stream 9.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-04-02 08:23:10 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Michael Paquier 2018-04-02 06:51:49 check_ssl_key_file_permissions should be in be-secure-common.c