Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-07-12 12:24:59
Message-ID: CAA4eK1LrUX5F1Bj8vAW-tvVt3isn+xVj8JdRxQA2xeCYz0FB8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2022 at 10:26 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
>> I think that won't work because each time on restart the slot won't be
>> fixed. Now, it is possible that we may drop the wrong slot if that
>> state of copying rel is SUBREL_STATE_DATASYNC. Also, it is possible
>> that while creating a slot, we fail because the same name slot already
>> exists due to some other worker which has created that slot has been
>> restarted. Also, what about origin_name, won't that have similar
>> problems? Also, if the state is already SUBREL_STATE_FINISHEDCOPY, if
>> the slot is not the same as we have used in the previous run of a
>> particular worker, it may start WAL streaming from a different point
>> based on the slot's confirmed_flush_location.
>
>
> You're right Amit. In case of a failure, tablesync phase of a relation may continue with different worker and replication slot due to this change in naming.
> Seems like the same replication slot should be used from start to end for a relation during tablesync. However, creating/dropping replication slots can be a major overhead in some cases.
> It would be nice if these slots are somehow reused.
>
> To overcome this issue, I've been thinking about making some changes in my patch.
> So far, my proposal would be as follows:
>
> Slot naming can be like pg_<sub_id>_<worker_pid> instead of pg_<sub_id>_<slot_index>. This way each worker can use the same replication slot during their lifetime.
> But if a worker is restarted, then it will switch to a new replication slot since its pid has changed.
>

I think using worker_pid also has similar risks of mixing slots from
different workers because after restart same worker_pid could be
assigned to a totally different worker. Can we think of using a unique
64-bit number instead? This will be allocated when each workers
started for the very first time and after that we can refer catalog to
find it as suggested in the idea below.

> pg_subscription_rel catalog can store replication slot name for each non-ready relation. Then we can find the slot needed for that particular relation to complete tablesync.
>

Yeah, this is worth investigating. However, instead of storing the
slot_name, we can store just the unique number (as suggested above).
We should use the same for the origin name as well.

> If a worker syncs a relation without any error, everything works well and this new replication slot column from the catalog will not be needed.
> However if a worker is restarted due to a failure, the previous run of that worker left its slot behind since it did not exit properly.
> And the restarted worker (with a different pid) will see that the relation is actually in SUBREL_STATE_FINISHEDCOPY and want to proceed for the catchup step.
> Then the worker can look for that particular relation's replication slot from pg_subscription_rel catalog (slot name should be there since relation state is not ready). And tablesync can proceed with that slot.
>
> There might be some cases where some replication slots are left behind. An example to such cases would be when the slot is removed from pg_subscription_rel catalog and detached from any relation, but tha slot actually couldn't be dropped for some reason. For such cases, a slot cleanup logic is needed. This cleanup can also be done by tablesync workers.
> Whenever a tablesync worker is created, it can look for existing replication slots that do not belong to any relation and any worker (slot name has pid for that), and drop those slots if it finds any.
>

This sounds tricky. Why not first drop slot/origin and then detach it
from pg_subscription_rel? On restarts, it is possible that we may
error out after dropping the slot or origin but before updating the
catalog entry but in such case we can ignore missing slot/origin and
detach them from pg_subscription_rel. Also, if we use the unique
number as suggested above, I think even if we don't remove it after
the relation state is ready, it should be okay.

> What do you think about this new way of handling slots? Do you see any points of concern?
>
> I'm currently working on adding this change into the patch. And would appreciate any comment.
>

Thanks for making progress!

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-07-12 12:31:56 Re: proposal: Allocate work_mem From Pool
Previous Message Aleksander Alekseev 2022-07-12 12:15:17 Re: [PATCH] Compression dictionaries for JSONB