Re: Parallel Apply

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: Parallel Apply
Date: 2026-05-13 05:24:20
Message-ID: CAJpy0uDCxK2wiM-uMYSMKkrp+pYOHxE5ns4ihFvht=JmytVJ3A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 6, 2026 at 9:18 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Apr 30, 2026 at 8:10 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, April 29, 2026 7:32 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > >
> > > Few more comments on v17-003:
> > >
> > >
> >
> > Thanks for the comments, I have addressed all of them.
> >
> > Here is the latest patch set.
>
> Thank you. I seem to have missed this email thread (as it was split
> into a new thread) and was waiting for the patches. I’ve just noticed
> it now and will resume the review.
>

Please find a few comments for patch003 mostly:

1)
* depended on by other transactions. Entries are of type ParallelizedTxnEntry.
*
* dshash is used to enable dynamic shared memory allocation based on
the number
- * of transactions being applied in parallel.
+ * of transactions being applied in parallel. Entries are of type
ParallelizedTxnEntry.
*/
static dsa_area *parallel_apply_dsa_area = NULL;
static dshash_table *parallelized_txns = NULL;

'Entries are of type ParallelizedTxnEntry' repeated twice in this comment.

2)
cleanup_committed_replica_identity_entries:

+ if (!skipped_write && !XLogRecPtrIsValid(pos->local_end))
+ continue;

Perhaps a comment will help to indicate above checks means a txn not
yet finished.

3)
Can you please clarify the scope, life-span of entries in
parallelized_txns vs ParallelApplyTxnHash. Both have remote-xid field.
So at any point of time, do both tables will have same number of
entries or if entries in one has bigger life-span/scope as compared to
other? It will be good to briefly mention these atop the hash-tables.

4)
+/*
+ * Hash table storing replica identity values for changes being applied in
+ * parallel, along with the last transaction that modified each row.
...
+static replica_identity_hash *replica_identity_table = NULL;

Regarding 'along with the last transaction that modified each row', is
'remote_xid' in ReplicaIdentityEntry is the last transaction that
modified this row or the one which is currently modifying it?

5)
Since we have added comments for rest for the fields for below
existing structure, do you think we can update comment for 'xid' as
well to say it is remote-one. It does not mention it anywhere in
comment.

typedef struct ParallelApplyWorkerEntry
{
TransactionId xid; /* Hash key -- must be first */

6)
003' commit message says about RI table entry:

Entries are deleted when transactions committed by parallel workers
are gathered, or the number of entries exceeds the limit.
--
I don't understand what do we mean by "when transactions committed by
parallel workers are gathered". Can we please make it more
clear/elaborate.

~~

Reviewing further.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-05-13 05:27:13 Re: [Bug] Add the missing RTE_GRAPH_TABLE case to transformLockingClause()
Previous Message jian he 2026-05-13 05:19:36 Re: [PATCH] Rebuild CHECK constraints after generated column SET EXPRESSION