Re: Parallel Apply

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Apply
Date: 2025-11-19 21:30:47
Message-ID: e4442437-6821-424f-a6bd-5ef8d9b5f9e9@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Kuroda-san,

On 11/18/25 12:00, Hayato Kuroda (Fujitsu) wrote:
> Dear Amit,
>
>> It seems you haven't sent the patch that preserves commit order or the
>> commit message of the attached patch is wrong. I think the first patch
>> in series should be the one that preserves commit order and then we
>> can build a patch that tracks dependencies and allows parallelization
>> without preserving commit order.
>
> I think I attached the correct file. Since we are trying to preserve
> the commit order by default, everything was merged into one patch.

I agree the goal should be preserving the commit order, unless someone
can demonstrate (a) clear performance benefits and (b) correctness. It's
not clear to me how would that deal e.g. with crashes, where some of the
"future" replicated transactions committed. Maybe it's fine, not sure.
But keeping the same commit order just makes it easier to think about
the consistency model, no?

So it seems natural to target the same commit order first, and then
maybe explore if relaxing that would be beneficial for some cases.

However, the patch seems fairly large (~80kB, although a fair bit of
that is comments). Would it be possible to split it into smaller chunks?
Is there some "minimal patch", which could be moved to 0001, and then
followed by improvements in 0002, 0003, ...? I sometimes do some
"infrastructure" first, and the actual patch in the last part (simply
using the earlier parts).

I'm not saying it has to be split (or how exactly), but I personally
find smaller patches easier to review ...

> One point to clarify is that dependency tracking is essential even if we fully
> preserve the commit ordering not to violate constrains like PK. Assuming there is
> a table which has PK, txn1 inserts a tuple and txn2 updates it. UPDATE statement
> in txn2 must be done after committing txn1.
>

Right. I don't see how we could do parallel apply correct in general
case without tracking these dependencies.

>> I feel it may be better to just
>> discuss preserve commit order patch that also contains some comments
>> as to how to extend it further, once that is done, we can do further
>> discussion of the other patch.
>
> I do agree, let me implement one by one.
>

Some comments / questions after looking at the patch today:

1) The way the patch determines dependencies seems to be the "writeset"
approach from other replication systems (e.g. MySQL does that). Maybe we
should stick to the same naming?

2) If I understand correctly, the patch maintains a "replica_identity"
hash table, with replica identity keys for all changes for all
concurrent transactions. How expensive can this be, in terms of CPU and
memory? What if I have multiple large batch transactions, each updating
millions of rows?

3) Would it make sense to use some alternative data structure? A bloom
filter, for example. Just a random idea, not sure if that's a good fit.

4) I've seen the benchmarks posted a couple days ago, and I'm running
some tests myself. But it's hard to say if the result is good or bad
without knowing what fraction of transactions finds a dependency and has
to wait for an earlier one. Would it be possible to track this
somewhere? Is there a suitable pg_stats_ view?

5) It's not clear to me how did you measure the TPS in your benchmark.
Did you measure how long it takes for the standby to catch up, or what
did you do?

6) Did you investigate why the speedup is just ~2.1 with 4 workers, i.e.
about half of the "ideal" speedup? Is it bottlenecked on WAL, leader
having to determine dependencies, or something else?

7) I'm a bit confused about the different types of dependencies, and at
which point they make the workers wait. There are the dependencies due
to modifying the same row, in which case the worker waits before
starting to apply the changes that hits the dependency. And then there's
a dependency to enforce commit order, in which case it waits before
commit. Right? Or did I get that wrong?

8) The commit message says:

> It would be challenge to check the dependency if the table has user
> defined trigger or constraints. the most viable solution might be to
> disallow parallel apply for relations whose triggers and constraints
> are not marked as parallel-safe or immutable.

Wouldn't this have similar issues with verifying these features on
partitioned tables as the patch that attempted to allow parallelism for
INSERT ... SELECT [1]? AFAICS it was too expensive to do with large
partitioning hierarchies.

9) I think it'd be good to make sure the "design" comments explain how
the new parts work in more detail. For example, the existing comment at
the beginning of applyparallelworker.c goes into a lot of detail, but
the patch adds only two fairly short paragraphs. Even the commit message
has more detail, which seems a bit strange.

10) For example it would be good to explain what "internal dependency"
and "internal relation" are for. I think I understand the internal
dependency, I'm still not quite sure why we need internal relation (or
rather why we didn't need it before).

11) I think it might be good to have TAP tests that stress this out in
various ways. Say, a test that randomly restarts the standby during
parallel apply, and checks it does not miss any records, etc. In the
online checksums patch this was quite useful. It wouldn't be part of
regular check-world, of course. Or maybe it'd be for development only?

regards

[1]
https://www.postgresql.org/message-id/flat/E1lJoQ6-0005BJ-DY%40gemulon.postgresql.org

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-11-19 21:43:25 Re: Issue with logical replication slot during switchover
Previous Message Tom Lane 2025-11-19 21:30:07 Re: PRI?64 vs Visual Studio (2022)