Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-07-29 10:17:44
Message-ID: CAA4eK1+PsvbjSH7yiCdCn=zcpAdM01MK=wdOuq879JcQqu6jwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 28, 2022 at 9:32 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>>
>> Why after step 4, do you need to drop the replication slot? Won't just
>> clearing the required info from the catalog be sufficient?
>
>
> The replication slots that we read from the catalog will not be used for anything else after we're done with syncing the table which the rep slot belongs to.
> It's removed from the catalog when the sync is completed and it basically becomes a slot that is not linked to any table or worker. That's why I think it should be dropped rather than left behind.
>
> Note that if a worker dies and its replication slot continues to exist, that slot will only be used to complete the sync process of the one table that the dead worker was syncing but couldn't finish.
> When that particular table is synced and becomes ready, the replication slot has no use anymore.
>

Why can't it be used to sync the other tables if any?

>>
>> Hmm, I think even if there is an iota of a chance which I think is
>> there, we can't use worker_pid. Assume, that if the same worker_pid is
>> assigned to another worker once the worker using it got an error out,
>> the new worker will fail as soon as it will try to create a
>> replication slot.
>
>
> Right. If something like that happens, worker will fail without doing anything. Then a new one will be launched and that one will continue to do the work.
> The worst case might be having conflicting pid over and over again while also having replication slots whose name includes one of those pids still exist.
> It seems unlikely but possible, yes.
>
>>
>> I feel it would be better or maybe we need to think of some other
>> identifier but one thing we need to think about before using a 64-bit
>> unique identifier here is how will we retrieve its last used value
>> after restart of server. We may need to store it in a persistent way
>> somewhere.
>
>
> We might consider storing this info in a catalog again. Since this last used value will be different for each subscription, pg_subscription can be a good place to keep that.
>

This sounds reasonable. Let's do this unless we get some better idea.

>>
>> The problems will be similar to the slot name. The origin is used to
>> track the progress of replication, so, if we use the wrong origin name
>> after the restart, it can send the wrong start_streaming position to
>> the publisher.
>
>
> I understand. But origin naming logic is still the same. Its format is like pg_<subid>_<relid> .
> I did not need to change this since it seems to me origins should belong to only one table. The patch does not reuse origins.
> So I don't think this change introduces an issue with origin. What do you think?
>

There is no such restriction that origins should belong to only one
table. What makes you think like that?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-07-29 10:25:10 Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.
Previous Message Alvaro Herrera 2022-07-29 09:59:00 Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.