Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-07-28 16:02:43
Message-ID: CAGPVpCRdDBRav4AR6NgAg+7HBfokppWwJkmjwzjWeuy1i7HYqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Why after step 4, do you need to drop the replication slot? Won't just
> clearing the required info from the catalog be sufficient?
>

The replication slots that we read from the catalog will not be used for
anything else after we're done with syncing the table which the rep slot
belongs to.
It's removed from the catalog when the sync is completed and it basically
becomes a slot that is not linked to any table or worker. That's why I
think it should be dropped rather than left behind.

Note that if a worker dies and its replication slot continues to exist,
that slot will only be used to complete the sync process of the one table
that the dead worker was syncing but couldn't finish.
When that particular table is synced and becomes ready, the replication
slot has no use anymore.

> Hmm, I think even if there is an iota of a chance which I think is
> there, we can't use worker_pid. Assume, that if the same worker_pid is
> assigned to another worker once the worker using it got an error out,
> the new worker will fail as soon as it will try to create a
> replication slot.
>

Right. If something like that happens, worker will fail without doing
anything. Then a new one will be launched and that one will continue to
do the work.
The worst case might be having conflicting pid over and over again while
also having replication slots whose name includes one of those pids still
exist.
It seems unlikely but possible, yes.

> I feel it would be better or maybe we need to think of some other
> identifier but one thing we need to think about before using a 64-bit
> unique identifier here is how will we retrieve its last used value
> after restart of server. We may need to store it in a persistent way
> somewhere.
>

We might consider storing this info in a catalog again. Since this last
used value will be different for each subscription, pg_subscription can be
a good place to keep that.

> The problems will be similar to the slot name. The origin is used to
> track the progress of replication, so, if we use the wrong origin name
> after the restart, it can send the wrong start_streaming position to
> the publisher.
>

I understand. But origin naming logic is still the same. Its format is like
pg_<subid>_<relid> .
I did not need to change this since it seems to me origins should belong to
only one table. The patch does not reuse origins.
So I don't think this change introduces an issue with origin. What do you
think?

Thanks,
Melih

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-07-28 16:13:05 Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands
Previous Message Alvaro Herrera 2022-07-28 15:59:11 Re: making relfilenodes 56 bits