Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-01-23 13:00:01
Message-ID: CAGPVpCRMVr0F7DAErbB9fnaqGpGV4t1yf0KWD+WPFe9bz80ykQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for your review.
Attached updated versions of the patches.

wangw(dot)fnst(at)fujitsu(dot)com <wangw(dot)fnst(at)fujitsu(dot)com>, 17 Oca 2023 Sal, 14:15
tarihinde şunu yazdı:

> On Wed, Jan 11, 2023 4:31 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> v3-0001* patch
> ===============
>
> 1. typedefs.list
> I think we also need to add "walrcv_slot_snapshot_fn" to this file.
>

Done.

> v7-0002* patch
> ===============
> 1. About function ReplicationOriginNameForLogicalRep()
> Do we need to modify the API of this function? I think the original API
> could
> also meet the current needs. Since this is not a static function, I think
> it
> seems better to keep the original API if there is no reason. Please let me
> know
> if I'm missing something.
>

You're right.
I still need to modify ReplicationOriginNameForLogicalRep. Origin names are
not tied to relations anymore, so their name doesn't need to
include relation id.
But I didn't really need to change the function signature. I reverted that
part of the change in the updated version of the patch.

> 2. Comment atop the function GetSubscriptionRelReplicationSlot
>

Done

> 3. typo
> + * At this point, there shouldn't be any existing
> replication
> + * origin wit the same name.
>

Done.

> 4. In function CreateSubscription
> + values[Anum_pg_subscription_sublastusedid - 1] = Int64GetDatum(1);
>
> I think it might be better to initialize this field to NULL or 0 here.
> Because in the patch, we always ignore the initialized value when launching
> the sync worker in the function process_syncing_tables_for_apply. And I
> think
> we could document in pg-doc that this value means that no tables have been
> synced yet.
>

I changed it to start from 0 and added a line into the related doc to
indicate that 0 means that no table has been synced yet.

> 5. New member "created_slot" in structure LogicalRepWorker
> + /*
> + * Indicates if the sync worker created a replication slot or it
> reuses an
> + * existing one created by another worker.
> + */
> + bool created_slot;
>
> I think the second half of the sentence looks inaccurate.
> Because I think this flag could be false even when we reuse an existing
> slot
> created by another worker. Assuming the first run for the worker tries to
> sync
> a table which is synced by another sync worker before, and the relstate is
> set
> to SUBREL_STATE_FINISHEDCOPY by another sync worker, I think this flag
> will not
> be set to true. (see function LogicalRepSyncTableStart)
>
> So, what if we simplify the description here and just say that this worker
> already has it's default slot?
>
> If I'm not missing something and you agree with this, please also kindly
> modify
> the relevant comment atop the if-statement
> (!MyLogicalRepWorker->created_slot)
> in the function LogicalRepSyncTableStart.
>

This "created_slot" indicates whether the current worker has created a
replication slot for its own use. If so, created_slot will be true,
otherwise false.
Let's say the tablesync worker has not created its own slot yet in its
previous runs or this is its first run. And the worker decides to reuse an
existing replication slot (which created by another tablesync worker). Then
created_slot is expected to be false. Because this particular
tablesync worker has not created its own slot yet in either of its runs.

In your example, the worker is in its first run and begin to sync a table
whose state is FINISHEDCOPY. If the table's state is FINISHEDCOPY then the
table should already have a replication slot created for its own sync
process. The worker will want to reuse that existing replication slot for
this particular table and it will not create a new replication slot. So
created_slot will be false, because the worker has not actually created any
replication slot yet.

Basically, created_slot is set to true only if "walrcv_create_slot" is
called by the tablesync worker any time during its lifetime. Otherwise,
it's possible that the worker can use existing replication slots for each
table it syncs. (e.g. if all the tables that the worker has synced were in
FINISHEDCOPY state, then the worker will not need to create a new slot).

Does it make sense now? Maybe I need to improve comments to make it clearer.

Best,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v4-0001-Add-replication-protocol-cmd-to-create-a-snapshot.patch application/octet-stream 20.6 KB
v8-0002-Reuse-Logical-Replication-Background-worker.patch application/octet-stream 72.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-01-23 13:27:13 Re: Generating code for query jumbling through gen_node_support.pl
Previous Message Hayato Kuroda (Fujitsu) 2023-01-23 12:34:00 RE: Perform streaming logical transactions by background workers and parallel apply