RE: Parallel Apply

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Parallel Apply
Date: 2025-09-17 05:18:37
Message-ID: TY4PR01MB16907DCD204FB9BBAD4E6206F9417A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, September 17, 2025 2:40 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 11/08/2025 7:45 AM, Amit Kapila wrote:
> > 4. Triggers and Constraints For the initial version, exclude tables with
> > user-defined triggers or constraints from parallel apply due to complexity in
> > dependency detection. We may need some parallel-apply-safe marking to allow
> > this. I think that the problem is wider than just triggers and constrains.
>
> Even if database has no triggers and constraints, there still can be causality
> violations.
>
> If transactions at subscriber are executed in different order than
> on publisher, then it is possible to observe some "invalid" database state which
> is never possible at publisher. Assume very simple example: you withdraw some
> money in ATM from one account and then deposit them to some other account. There
> are two different transactions. And there are no any dependencies between them
> (they update different records). But if second transaction is committed before
> first, then we can view incorrect report where total number of money at all
> accounts exceeds real balance. Another case is when you persisting some stream
> of events (with timestamps). It may be confusing if at subscriber monotony of
> events is violated.
>
> And there can be many other similar situations when tjere are no "direct" data
> dependencies between transactions, but there are hidden "indirect"dependencies.
> The most popular case you have mentioned: foreign keys. Certainly support of
> referential integrity constraints can be added. But there can be such
> dependencies without correspondent constraints in database schema.

Yes, I agree with these situations, which is why we suggest allowing
out-of-commit options while preserving commit order by default. However, I think
not all use cases are affected by non-direct dependencies because we ensure
eventual consistency in out-of-order commit anyway. Additionally, databases like
Oracle and MySQL support out-of-order parallel apply, IIRC.

>
> You have also suggested to add option which will force preserving commit order.
> But my experiments with `debug_logical_replication_streaming=immediate` shows
> that in this case for short transactions performance with parallel workers is
> even worser than with single apply worker.

I think debug_logical_replication_streaming=immediate differs from real parallel
apply . It wasn't designed to simulate genuine parallel application because it
restricts parallelism by requiring the leader to wait for each transaction to
complete on commit. To achieve in-order parallel apply, each parallel apply
worker should wait for the preceding transaction to finish, similar to the
dependency wait in the current POC patch. We plan to extend the patch to support
in-order parallel apply and will test its performance.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-09-17 05:21:22 Remove PointerIsValid()
Previous Message David Rowley 2025-09-17 04:59:51 Make TID Scans recalculate the TIDs less often