From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: Logical replication prefetch |
Date: | 2025-07-14 06:35:53 |
Message-ID: | facc2fa1-31f4-48d4-9588-1165ebafa620@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14/07/2025 4:20 am, Zhijie Hou (Fujitsu) wrote:
> Thank you for the proposal ! I find it to be a very interesting feature。
>
> I tested the patch you shared in your original email and encountered potential
> deadlocks when testing pgbench TPC-B like workload. Could you please provide an
> updated patch version so that I can conduct further performance experiments ?
Sorry, it was fixed in my repo: https://github.com/knizhnik/postgres/pull/3
Updated patch is attached.
> Additionally, I was also exploring ways to improve performance and have tried an
> alternative version of prefetch for experimentation. The alternative design is
> that we assigns each non-streaming transaction to a parallel apply worker, while
> strictly maintaining the order of commits. During parallel apply, if the
> transactions that need to be committed before the current transaction are not
> yet finished, the worker performs pre-fetch operations. Specifically, for
> updates and deletes, the worker finds and caches the target local tuple to be
> updated/deleted. Once all preceding transactions are committed, the parallel
> apply worker uses these cached tuples to execute the actual updates or deletes.
> What do you think about this alternative ? I think the alternative might offer
> more stability in scenarios where shared buffer elimination occurs frequently
> and avoids leaving dead tuples in the buffer. However, it also presents some
> drawbacks, such as the need to add wait events to maintain commit order,
> compared to the approach discussed in this thread.
So as far as I understand your PoC is doing the same as approach 1 in my
proposal - prefetch of replica identity, but it is done not by parallel
prefetch workers, but normal parallel apply workers when they have to
wait until previous transaction is committed. I consider it to be more
complex but may be more efficient than my approach.
The obvious drawback of both your's and my approaches is that it
prefetch only pages of primary index (replica identity). If there are
some other indexes which keys are changed by update, then pages of such
indexes will be read from the disk when you apply update. The same is
also true for insert (in this case you always has to include new tuple
in all indexes) - this is why I have also implemented another approach:
apply operation in prefetch worker and then rollback transaction.
Also I do not quite understand how you handle invalidations? Assume that
we have two transactions - T1 and T2:
T1: ... W1 Commit
T2: ... W1
So T1 writes tuple 1 and then commits transaction. Then T2 updates tuple 1.
If I correctly understand your approach, parallel apply worker for T2
will try to prefetch tuple 1 before T1 is committed.
But in this case it will get old version of the tuple. It is not a
problem if parallel apply worker will repeat lookup and not just use
cached tuple.
One more moment. As far as you assigns each non-streaming transaction to
a parallel apply worker, number of such transactions is limited by
assigns each non-streaming transaction to a parallel apply worker,umber
of background workers. Usually it is not so large (~10). So if there
were 100 parallel transactions and publishers, then at subscriber you
still be able to executed concurrently not more than few of them. In
this sense my approach with separate prefetch workers is more flexible:
each prefetch worker can prefetch as many operations as it can.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-logical-replication-prefetch.patch | text/plain | 31.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-07-14 06:37:47 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
Previous Message | jian he | 2025-07-14 06:35:31 | comment in index_create "text_eq" should be "texteq" |