Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-01-21 10:17:23
Message-ID: CAA4eK1LGxuB_RTfZ2HLJT76wv=FLV6UPqT+FWkiDg61rvQkkmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Amit.
>
> PSA the v17 patch for the Tablesync Solution1.
>

Thanks for the updated patch. Below are few comments:
1. Why are we changing the scope of PG_TRY in DropSubscription()?
Also, it might be better to keep the replication slot drop part as it
is.

2.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a
FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.

In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE?

3.
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);

The function name and message makes it sound like that we drop slot
and origin at any interrupt. Isn't it better to name it as
tablesync_cleanup_at_shutdown()?

4.
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is missing. */

I think we can avoid using missing_ok and drop_slot_needed variables.

5. Can we drop the origin along with the slot in
process_syncing_tables_for_sync() instead of
process_syncing_tables_for_apply()? I think this is possible because
of the other changes you made in origin.c. Also, if possible, we can
try to use the same code to drop the slot and origin in
tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync.

6.
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */

There seems to be a typo (crashed/etc) in the above comment.

7.
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";

In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";

Is there a reason why we can't use the existing way to check for
failure in this case?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-01-21 10:23:16 Re: Printing LSN made easy
Previous Message Heikki Linnakangas 2021-01-21 10:14:10 Re: ResourceOwner refactoring