Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-08-15 11:26:26
Message-ID: CAGPVpCT0+_np5z9jj8OcFNtzoeSSoT+fObNQNFDjPeB12HMinw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 6 Ağu 2022 Cmt, 16:01 tarihinde şunu
yazdı:

> I think there is some basic flaw in slot reuse design. Currently, we
> copy the table by starting a repeatable read transaction (BEGIN READ
> ONLY ISOLATION LEVEL REPEATABLE READ) and create a slot that
> establishes a snapshot which is first used for copy and then LSN
> returned by it is used in the catchup phase after the copy is done.
> The patch won't establish such a snapshot before a table copy as it
> won't create a slot each time. If this understanding is correct, I
> think we need to use ExportSnapshot/ImportSnapshot functionality to
> achieve it or do something else to avoid the problem mentioned.
>

I did not really think about the snapshot created by replication slot while
making this change. Thanks for pointing it out.
I've been thinking about how to fix this issue. There are some points I'm
still not sure about.
If the worker will not create a new replication slot, which snapshot should
we actually export and then import?
At the line where the worker was supposed to create replication slot but
now will reuse an existing slot instead, calling pg_export_snapshot() can
export the snapshot instead of CREATE_REPLICATION_SLOT.
However, importing that snapshot into the current transaction may not make
any difference since we exported that snapshot from the same transaction. I
think this wouldn't make sense.
How else an export/import snapshot logic can be placed in this change?

LSN also should be set accurately. The current change does not handle LSN
properly.
I see that CREATE_REPLICATION_SLOT returns consistent_point which indicates
the earliest location which streaming can start from. And this
consistent_point is used as origin_startpos.
If that's the case, would it make sense to use "confirmed_flush_lsn" of the
replication slot in case the slot is being reused?
Since confirmed_flush_lsn can be considered as the safest, earliest
location which streaming can start from, I think it would work.

And at this point, with the correct LSN, I'm wondering whether this
export/import logic is really necessary if the worker does not create a
replication slot. What do you think?

For small tables, it could have a visible performance difference as it
> involves database write operations to each time create and drop the
> origin. But if we don't want to reuse then also you need to set its
> origin_lsn appropriately. Currently (without this patch), after
> creating the slot, we directly use the origin_lsn returned by
> create_slot API whereas now it won't be the same case as the patch
> doesn't create a slot every time.
>

Correct. For this issue, please consider the LSN logic explained above.

Thanks,
Melih

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2022-08-15 12:06:32 Re: ICU for global collation
Previous Message Damir Belyalov 2022-08-15 11:10:44 Fwd: Merging statistics from children instead of re-sampling everything