Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-07-21 04:09:15
Message-ID: CAHut+Psx7hg+5rQf-EZ3sP33S+asFN3s7BdhkgqeyfDFZBxjbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some review comments for v21-0002.

On Thu, Jul 20, 2023 at 11:41 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> Hi,
>
> Attached the updated patches with recent reviews addressed.
>
> See below for my comments:
>
> Peter Smith <smithpb2250(at)gmail(dot)com>, 19 Tem 2023 Çar, 06:08 tarihinde şunu yazdı:
>>
>> 5.
>> + /* Found a table for next iteration */
>> + finish_sync_worker(true);
>> +
>> + StartTransactionCommand();
>> + ereport(LOG,
>> + (errmsg("logical replication worker for subscription \"%s\" will be
>> reused to sync table \"%s\" with relid %u.",
>> + MySubscription->name,
>> + get_rel_name(MyLogicalRepWorker->relid),
>> + MyLogicalRepWorker->relid)));
>> + CommitTransactionCommand();
>> +
>> + done = false;
>> + break;
>> + }
>> + LWLockRelease(LogicalRepWorkerLock);
>>
>>
>> 5b.
>> Isn't there a missing call to that LWLockRelease, if the 'break' happens?
>
>
> Lock is already released before break, if that's the lock you meant:
>
>> /* Update worker state for the next table */
>> MyLogicalRepWorker->relid = rstate->relid;
>> MyLogicalRepWorker->relstate = rstate->state;
>> MyLogicalRepWorker->relstate_lsn = rstate->lsn;
>> LWLockRelease(LogicalRepWorkerLock);
>>
>>
>> /* Found a table for next iteration */
>> finish_sync_worker(true);
>> done = false;
>> break;
>
>

Sorry, I misread the code. You are right.

======
src/backend/replication/logical/tablesync.c

1.
+ if (!reuse_worker)
+ {
+ ereport(LOG,
+ (errmsg("logical replication table synchronization worker for
subscription \"%s\" has finished",
+ MySubscription->name)));
+ }
+ else
+ {
+ ereport(LOG,
+ (errmsg("logical replication worker for subscription \"%s\" will be
reused to sync table \"%s\" with relid %u.",
+ MySubscription->name,
+ get_rel_name(MyLogicalRepWorker->relid),
+ MyLogicalRepWorker->relid)));
+ }

1a.
We know this must be a tablesync worker, so I think that second errmsg
should also be saying "logical replication table synchronization
worker".

~

1b.
Since this is if/else anyway, is it simpler to be positive and say "if
(reuse_worker)" instead of the negative "if (!reuse_worker)"

~~~

2. run_tablesync_worker
{
+ MyLogicalRepWorker->relsync_completed = false;
+
+ /* Start table synchronization. */
start_table_sync(origin_startpos, &slotname);
This still contains the added comment that I'd previously posted I
thought was adding anything useful. Also, I didn't think this comment
exists in the HEAD code.
======
src/backend/replication/logical/worker.c

3. LogicalRepApplyLoop

+ /*
+ * apply_dispatch() may have gone into apply_handle_commit()
+ * which can call process_syncing_tables_for_sync.
+ *
+ * process_syncing_tables_for_sync decides whether the sync of
+ * the current table is completed. If it is completed,
+ * streaming must be already ended. So, we can break the loop.
+ */
+ if (am_tablesync_worker() &&
+ MyLogicalRepWorker->relsync_completed)
+ {
+ endofstream = true;
+ break;
+ }
+

Maybe just personal taste, but IMO it is better to rearrange like
below because then there is no reason to read the long comment except
for tablesync workers.

if (am_tablesync_worker())
{
/*
* apply_dispatch() may have gone into apply_handle_commit()
* which can call process_syncing_tables_for_sync.
*
* process_syncing_tables_for_sync decides whether the sync of
* the current table is completed. If it is completed,
* streaming must be already ended. So, we can break the loop.
*/
if (MyLogicalRepWorker->relsync_completed)
{
endofstream = true;
break;
}
}

~~~

4. LogicalRepApplyLoop

+
+ /*
+ * If relsync_completed is true, this means that the tablesync
+ * worker is done with synchronization. Streaming has already been
+ * ended by process_syncing_tables_for_sync. We should move to the
+ * next table if needed, or exit.
+ */
+ if (am_tablesync_worker() &&
+ MyLogicalRepWorker->relsync_completed)
+ endofstream = true;

Ditto the same comment about rearranging the condition, as #3 above.

======
src/include/replication/worker_internal.h

5.
+ /*
+ * Indicates whether tablesync worker has completed syncing its assigned
+ * table.
+ */
+ bool relsync_completed;
+

Isn't it better to arrange this to be adjacent to other relXXX fields,
so they all clearly belong to that "Used for initial table
synchronization." group?

For example, something like:

/* Used for initial table synchronization. */
Oid relid;
char relstate;
XLogRecPtr relstate_lsn;
slock_t relmutex;
bool relsync_completed; /* has tablesync finished syncing
the assigned table? */

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mingli Zhang 2023-07-21 04:13:35 Re: Buildfarm failures on chipmunk
Previous Message vignesh C 2023-07-21 03:40:54 Buildfarm failures on chipmunk