Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-08-05 13:55:09
Message-ID: CAGPVpCQazGgGu60YXfdoUqqjn9-B7vP7=DTNCZfp51Fg==pQ9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

>> Why after step 4, do you need to drop the replication slot? Won't just
> >> clearing the required info from the catalog be sufficient?
> >
> >
> > The replication slots that we read from the catalog will not be used for
> anything else after we're done with syncing the table which the rep slot
> belongs to.
> > It's removed from the catalog when the sync is completed and it
> basically becomes a slot that is not linked to any table or worker. That's
> why I think it should be dropped rather than left behind.
> >
> > Note that if a worker dies and its replication slot continues to exist,
> that slot will only be used to complete the sync process of the one table
> that the dead worker was syncing but couldn't finish.
> > When that particular table is synced and becomes ready, the replication
> slot has no use anymore.
> >
>
> Why can't it be used to sync the other tables if any?
>

It can be used. But I thought it would be better not to, for example in the
following case:
Let's say a sync worker starts with a table in INIT state. The worker
creates a new replication slot to sync that table.
When sync of the table is completed, it will move to the next one. This
time the new table may be in FINISHEDCOPY state, so the worker may need to
use the new table's existing replication slot.
Before the worker will move to the next table again, there will be two
replication slots used by the worker. We might want to keep one and drop
the other.
At this point, I thought it would be better to keep the replication slot
created by this worker in the first place. I think it's easier to track
slots this way since we know how to generate the rep slots name.
Otherwise we would need to store the replication slot name somewhere too.

> This sounds reasonable. Let's do this unless we get some better idea.
>

I updated the patch to use an unique id for replication slot names and
store the last used id in the catalog.
Can you look into it again please?

There is no such restriction that origins should belong to only one
> table. What makes you think like that?
>

I did not reuse origins since I didn't think it would significantly improve
the performance as reusing replication slots does.
So I just kept the origins as they were, even if it was possible to reuse
them. Does that make sense?

Best,
Melih

Attachment Content-Type Size
v3-0001-Reuse-Logical-Replication-Background-worker.patch application/octet-stream 54.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-08-05 14:07:00 Re: annoyance with .git-blame-ignore-revs
Previous Message Danny Shemesh 2022-08-05 13:43:36 Expr. extended stats are skipped with equality operator