Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-21 05:25:07
Message-ID: CAA4eK1LLWo8X8OFLuupW8iCYY+FnMLNBqm4XtKJV_YCuL02rQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 19, 2020 at 12:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
>
> I understand why you are trying to create this patch atop logical
> decoding of 2PC patch but I think it is better to create this as an
> independent patch and then use it to test 2PC problem. Also, please
> explain what kind of testing you did to ensure that it works properly
> after the table sync worker restarts after the crash.
>

Few other comments:
==================
1.
* FIXME 3 - Crashed tablesync workers may also have remaining slots
because I don't think
+ * such workers are even iterated by this loop, and nobody else is
removing them.
+ */
+ if (slotname)
+ {

The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?

2.
DropSubscription()
{
..
ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
..
}

With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.

3.
ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char
*conninfo, char *subname, char *slotname)
{
..
+ PG_TRY();
..
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();

Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?

4.
@@ -139,6 +141,28 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();

+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);

This is not how we export functions at other places?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-21 06:31:38 Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
Previous Message Tom Lane 2020-12-21 03:10:35 Re: \gsetenv