Re: Single transaction in the tablesync worker?

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-22 11:13:12
Message-ID: CAHut+PuCLty2HGNT6neyOcUmBNxOLo=ybQ2Yv-nTR4kFY-8QLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit.

PSA my v6 WIP patch for the Solution1.

This patch still applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com

(I understand you would like this to be delivered as a separate patch
independent of v30. I will convert it ASAP)

====

Coded / WIP:

* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot name.

* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions

* tablesync worked now allowing multiple tx instead of single tx

* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.

* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.

* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.

* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply

TODO / Known Issues:

* Crashed tablesync workers may not be known to DropSubscription
current code. This might be a problem to cleanup slots and/or origin
tracking belonging to those unknown workers.

* There seems to be a race condition during DROP SUBSCRIPTION. It
manifests as the TAP test 007 hanging. Logging shows it seems to be
during replorigin_drop when called from DropSubscription. It is timing
related and quite rare - e.g. Only happens 1x every 10x running
subscription TAP tests.

* Help / comments / cleanup

* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development

* Address review comments

---

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
v6-0002-WIP-patch-for-the-Solution1.patch application/octet-stream 25.6 KB
v6-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patch application/octet-stream 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-12-22 11:19:26 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Thomas Munro 2020-12-22 11:06:50 Re: pg_preadv() and pg_pwritev()