RE: Parallel Apply

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Parallel Apply
Date: 2026-04-30 14:39:24
Message-ID: TYRPR01MB14195F002941AF23A1350F00D94352@TYRPR01MB14195.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, April 29, 2026 2:51 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Apr 27, 2026 at 4:10 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear hackers,
> >
> > Here is new patch set. The Option 2 was chosen for the format of
> > internal messages based on the discussion.
> >
>
> Thanks. Please find a few comments:

Thanks for the comments.

> 5)
> Now there are 3 types of hash-tables:
>
> HTAB used for existing ParallelApplyTxnHash dynamic-hash for
> parallelized_txns simplehash for replica_identity_table
>
> a)
> What was the design rationale behind choosing different hash table
> implementations for these different uses? Is there a comment, prior discussion,
> or email thread where this reasoning has been explained?

For parallelized_txns, the hash table must reside in shared memory and needs to
grow dynamically based on the number of transactions. dshash is the only natural choice
here: simplehash does not seem to be designed for shared memory, and HTAB (which
does support shared memory) cannot expand its size on the fly.

For replica_identity_table, we need maximum efficiency since every change
requires a hash operation. simplehash is the better choice here. See the
comments atop simplehash.h for details.

For ParallelApplyTxnHash (used per large transaction with a simple XID key),
we prioritize ease of setup, so HTAB is sufficient.

I've added brief comments for new hash tables in the patch set.

>
> b)
> Regarding names, was it a conscious decision to name parallelized_txns and
> replica_identity_table in snake_case case different from existing
> 'ParallelApplyTxnHash'? I know other code using DSA often follows snake_case,
> but having ParallelApplyTxnHash and parallelized_txns defined next to each
> other feels a bit inconsistent. Also, I wonder whether hash table objects should
> perhaps be named more like ParallelApplyTxnHash, so they stand out while
> reading the code; otherwise they mostly look like ordinary local variables,
> which they are not.

I don't have a strong opinion on the naming of the hash tables. I'll wait to
see if others have suggestions, as renaming is easy to address once the
patches become stable.

>
> 6)
> cleanup_committed_replica_identity_entries:
>
> + if (!skipped_write && !XLogRecPtrIsValid(pos->local_end))
> + continue;
>
> We can use XLogRecPtrIsInvalid instead.

XLogRecPtrIsInvalid() is a deprecated macro, so we should avoid using it.

Other comments have been addressed in the latest patch set.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2026-04-30 14:40:17 RE: Parallel Apply
Previous Message Zhijie Hou (Fujitsu) 2026-04-30 14:38:45 RE: Parallel Apply