From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel Apply |
Date: | 2025-09-16 18:40:07 |
Message-ID: | d25177ef-0d1d-4489-9b2d-b6c43ad6677c@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/08/2025 7:45 AM, Amit Kapila wrote:
> Hi,
>
>
> 4. Triggers and Constraints
> For the initial version, exclude tables with user-defined triggers or
> constraints from parallel apply due to complexity in dependency
> detection. We may need some parallel-apply-safe marking to allow this.
I think that the problem is wider than just triggers and constrains.
Even if database has no triggers and constraints, there still can be
causality violations.
If transactions at subscriber are executed in different order than on
publisher, then it is possible to observe some "invalid" database state
which is never possible at publisher. Assume very simple example: you
withdraw some money in ATM from one account and then deposit them to
some other account. There are two different transactions. And there are
no any dependencies between them (they update different records). But if
second transaction is committed before first, then we can view incorrect
report where total number of money at all accounts exceeds real balance.
Another case is when you persisting some stream of events (with
timestamps). It may be confusing if at subscriber monotony of events is
violated.
And there can be many other similar situations when tjere are no
"direct" data dependencies between transactions, but there are hidden
"indirect"dependencies. The most popular case you have mentioned:
foreign keys. Certainly support of referential integrity constraints can
be added. But there can be such dependencies without correspondent
constraints in database schema.
You have also suggested to add option which will force preserving commit
order. But my experiments with
`debug_logical_replication_streaming=immediate` shows that in this case
for short transactions performance with parallel workers is even worser
than with single apply worker.
May be it is possible to enforce some weaker commit order: do not try to
commit transactions in exactly the same order as at publisher, but if
transaction T1 at publisher is started after T2 is committed, then T2
can not be committed before T1 at subscriber. Unfortunately it is not
clear how to enforce such "partial order" - `LogicalRepBeginData`
contains `finish_lsn`, but not `start_lsn`.
First time I read your proposal and especially after seen concrete
results of it's implementation, I decided than parallel apply approach
is definitely better than prefetch approach. But now I am not so sure.
Yes, parallel apply is about 2x times faster than parallel prefetch. But
still parallel prefetch allows to 2-3 times increase LR speed without
causing any problems with deadlock, constraints, triggers,...
>
> Replication Progress Tracking
> -----------------------------------------
> Parallel apply introduces out-of-order commit application,
> complicating replication progress tracking. To handle restarts and
> ensure consistency:
>
> Track Three Key Metrics:
> lowest_remote_lsn: Starting point for applying transactions.
> highest_remote_lsn: Highest LSN that has been applied.
> list_remote_lsn: List of commit LSNs applied between the lowest and highest.
>
> Mechanism:
> Store these in ReplicationState: lowest_remote_lsn,
> highest_remote_lsn, list_remote_lsn. Flush these to disk during
> checkpoints similar to CheckPointReplicationOrigin.
>
> After Restart, Start from lowest_remote_lsn and for each transaction,
> if its commit LSN is in list_remote_lsn, skip it, otherwise, apply it.
> Once commit LSN > highest_remote_lsn, apply without checking the list.
>
> During apply, the leader maintains list_in_progress_xacts in the
> increasing commit order. On commit, update highest_remote_lsn. If
> commit LSN matches the first in-progress xact of
> list_in_progress_xacts, update lowest_remote_lsn, otherwise, add to
> list_remote_lsn. After commit, also remove it from the
> list_in_progress_xacts. We need to clean up entries below
> lowest_remote_lsn in list_remote_lsn while updating its value.
>
> To illustrate how this mechanism works, consider the following four
> transactions:
>
> Transaction ID Commit LSN
> 501 1000
> 502 1100
> 503 1200
> 504 1300
>
> Assume:
> Transactions 501 and 502 take longer to apply whereas transactions 503
> and 504 finish earlier. Parallel apply workers are assigned as
> follows:
> pa-1 → 501
> pa-2 → 502
> pa-3 → 503
> pa-4 → 504
>
> Initial state: list_in_progress_xacts = [501, 502, 503, 504]
>
> Step 1: Transaction 503 commits first and in RecordTransactionCommit,
> it updates highest_remote_lsn to 1200. In apply_handle_commit, since
> 503 is not the first in list_in_progress_xacts, add 1200 to
> list_remote_lsn. Remove 503 from list_in_progress_xacts.
> Step 2: Transaction 504 commits, Update highest_remote_lsn to 1300.
> Add 1300 to list_remote_lsn. Remove 504 from list_in_progress_xacts.
> ReplicationState now:
> lowest_remote_lsn = 0
> list_remote_lsn = [1200, 1300]
> highest_remote_lsn = 1300
> list_in_progress_xacts = [501, 502]
>
> Step 3: Transaction 501 commits. Since 501 is now the first in
> list_in_progress_xacts, update lowest_remote_lsn to 1000. Remove 501
> from list_in_progress_xacts. Clean up list_remote_lsn to remove
> entries < lowest_remote_lsn (none in this case).
> ReplicationState now:
> lowest_remote_lsn = 1000
> list_remote_lsn = [1200, 1300]
> highest_remote_lsn = 1300
> list_in_progress_xacts = [502]
>
> Step 4: System crash and restart
> Upon restart, Start replication from lowest_remote_lsn = 1000. First
> transaction encountered is 502, since it is not present in
> list_remote_lsn, apply it. As transactions 503 and 504 are present in
> list_remote_lsn, we skip them. Note that each transaction's
> end_lsn/commit_lsn has to be compared which the apply worker receives
> along with the first transaction command BEGIN. This ensures
> correctness and avoids duplicate application of already committed
> transactions.
>
> Upon restart, start replication from lowest_remote_lsn = 1000. First
> transaction encountered is 502 with commit LSN 1100, since it is not
> present in list_remote_lsn, apply it. As transactions 503 and 504's
> respective commit LSNs [1200, 1300] are present in list_remote_lsn, we
> skip them. This ensures correctness and avoids duplicate application
> of already committed transactions.
>
> Now, it is possible that some users may want to parallelize the
> transaction but still want to maintain commit order because they don't
> explicitly annotate FK, PK for columns but maintain the integrity via
> application. So, in such cases as we won't be able to detect
> transaction dependencies, it would be better to allow out-of-order
> commits optionally.
>
> Thoughts?
>
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Burd | 2025-09-16 19:04:39 | Re: [PATCH] Add tests for Bitmapset |
Previous Message | Masahiko Sawada | 2025-09-16 18:30:22 | Re: POC: Parallel processing of indexes in autovacuum |