Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-01-27 10:11:56
Message-ID: CAJpy0uAo+AWTMU9s7bJUXJuyd-NvMM290Au2BMtQMfpmviXLNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2023 at 7:53 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> If a relation is currently being synced by a tablesync worker and uses a replication slot/origin for that operation, then srrelslotname and srreloriginname fields will have values.
> When a relation is done with its replication slot/origin, their info gets removed from related catalog row, so that slot/origin can be reused for another table or dropped if not needed anymore.
> In your case, all relations are in READY state so it's expected that srrelslotname and srreloriginname are empty. READY relations do not need a replication slot/origin anymore.
>
> Tables are probably synced so quickly that you're missing the moments when a tablesync worker copies a relation and stores its rep. slot/origin in the catalog.
> If initial sync is long enough, then you should be able to see the columns get updated. I follow [1] to make it longer and test if the patch really updates the catalog.
>

Thank You for the details. It is clear now.
>

>
> Rebased and resolved conflicts. Please check the new version
>
Please find my suggestions on v9:

1.
--Can we please add a few more points to the Summary to make it more clear.
a) something telling that reusability of workers is for tables under
one subscription and not across multiple subscriptions.
b) Since we are reusing both workers and slots, can we add:
--when do we actually end up spawning a new worker
--when do we actually end up creating a new slot in a worker rather
than using existing one.
--if we reuse existing slots, what happens to the snapshot?

2.
+ The last used ID for tablesync workers. This ID is used to
+ create replication slots. The last used ID needs to be stored
+ to make logical replication can safely proceed after any interruption.
+ If sublastusedid is 0, then no table has been synced yet.

--typo:
to make logical replication can safely proceed ==> to make logical
replication safely proceed

--Also, does it sound better:
The last used ID for tablesync workers. It acts as an unique
identifier for replication slots
which are created by table-sync workers. The last used ID needs to be
persisted...

3.
is_first_run;
move_to_next_rel;
--Do you think one variable is enough here as we do not get any extra
info by using 2 variables? Can we have 1 which is more generic like
'ready_to_reuse'. Otherwise, please let me know if we must use 2.

4.
/* missin_ok = true, since the origin might be already dropped. */
typo: missing_ok

5. GetReplicationSlotNamesBySubId:
errmsg("not tuple returned."));

Can we have a better error msg:
ereport(ERROR,
errmsg("could not receive list of slots
associated with subscription %d, error: %s", subid, res->err));

6.
static void
clean_sync_worker(void)

--can we please add introductory comment for this function.

7.
/*
* Pick the table for the next run if there is not another worker
* already picked that table.
*/
Pick the table for the next run if it is not already picked up by
another worker.

8.
process_syncing_tables_for_sync()

/* Cleanup before next run or ending the worker. */
--can we please improve this comment:
if there is no more work left for this worker, stop the worker
gracefully, else do clean-up and make it ready for the next
relation/run.

9.
LogicalRepSyncTableStart:
* Read previous slot name from the catalog, if exists.
*/
prev_slotname = (char *) palloc0(NAMEDATALEN);
Do we need to free this at the end?

10.
if (strlen(prev_slotname) == 0)
{
elog(ERROR, "Replication slot could not be
found for relation %u",
MyLogicalRepWorker->relid);
}
shall we mention subid also in error msg.

I am reviewing further...
thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-01-27 10:15:04 Re: Deadlock between logrep apply worker and tablesync worker
Previous Message Peter Eisentraut 2023-01-27 10:00:01 Re: improving user.c error messages