From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical replication prefetch |
Date: | 2025-07-11 14:19:03 |
Message-ID: | 26dcc7a3-c3c1-44a4-87e0-bfc68fe7901d@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08/07/2025 2:51 pm, Amit Kapila wrote:
> On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> There is well known Postgres problem that logical replication subscriber
>> can not caught-up with publisher just because LR changes are applied by
>> single worker and at publisher changes are made by
>> multiple concurrent backends. The problem is not logical replication
>> specific: physical replication stream is also handled by single
>> walreceiver. But for physical replication Postgres now implements
>> prefetch: looking at WAL record blocks it is quite easy to predict which
>> pages will be required for redo and prefetch them. With logical
>> replication situation is much more complicated.
>>
>> My first idea was to implement parallel apply of transactions. But to do
>> it we need to track dependencies between transactions. Right now
>> Postgres can apply transactions in parallel, but only if they are
>> streamed (which is done only for large transactions) and serialize them
>> by commits. It is possible to enforce parallel apply of short
>> transactions using `debug_logical_replication_streaming` but then
>> performance is ~2x times slower than in case of sequential apply by
>> single worker.
>>
> What is the reason of such a large slow down? Is it because the amount
> of network transfer has increased without giving any significant
> advantage because of the serialization of commits?
It is not directly related with subj, but I do not understand this code:
```
/*
* Stop the worker if there are enough workers in the pool.
*
* XXX Additionally, we also stop the worker if the leader apply worker
* serialize part of the transaction data due to a send timeout.
This is
* because the message could be partially written to the queue and
there
* is no way to clean the queue other than resending the message
until it
* succeeds. Instead of trying to send the data which anyway would have
* been serialized and then letting the parallel apply worker deal with
* the spurious message, we stop the worker.
*/
if (winfo->serialize_changes ||
list_length(ParallelApplyWorkerPool) >
(max_parallel_apply_workers_per_subscription / 2))
{
logicalrep_pa_worker_stop(winfo);
pa_free_worker_info(winfo);
return;
}
```
It stops worker if number fo workers in pool is more than half of
`max_parallel_apply_workers_per_subscription`.
What I see is that `pa_launch_parallel_worker` spawns new workers and
after completion of transaction it is immediately terminated.
Actually this leads to awful slowdown of apply process.
If I just disable and all
`max_parallel_apply_workers_per_subscription`are actually used for
applying transactions, then time of parallel apply with 4 workers is 6
minutes comparing with 10 minutes fr applying all transactions by main
workers. It is still not so larger improvement, but at least it is
improvement and not degradation.
From | Date | Subject | |
---|---|---|---|
Next Message | Nitin Motiani | 2025-07-11 14:19:47 | Re: Horribly slow pg_upgrade performance with many Large Objects |
Previous Message | Andrei Lepikhov | 2025-07-11 14:09:24 | Re: track needed attributes in plan nodes for executor use |