Re: Proposal: Conflict log history table for Logical Replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-04 07:01:56
Message-ID: CAHut+PuvkGKW=z8L3nR6dCQxa1FHxC9A1wrbOG_S92fh90zTNg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Some review comments for v9-0001.

======
Commit message.

1.
Note: A single remote tuple may conflict with multiple local conflict
when conflict type
is CT_MULTIPLE_UNIQUE_CONFLICTS, so for handling this case we create a
single row in
conflict log table with respect to each remote conflict row even if it
conflicts with
multiple local rows and we store the multiple conflict tuples as a
single JSON array
element in format as
[ { "xid": "1001", "commit_ts": "...", "origin": "...", "tuple": {...} }, ... ]
We can extract the elements from local tuple as given in below example

~

Something seems broken/confused with this description:

1a.
"A single remote tuple may conflict with multiple local conflict"
Should that say "... with multiple local tuples" ?

~

1b.
There is a mixture of terminology here, "row" vs "tuple", which
doesn't seem correct.

~

1c.
"We can extract the elements from local tuple"
Should that say "... elements of the local tuples from the CLT row ..."

======
src/backend/replication/logical/conflict.c

2.
+
+#define N_LOCAL_CONFLICT_INFO_ATTRS 4

I felt it would be better to put this where it is used. e.g. IMO put
it within the build_conflict_tupledesc().

~~~

InsertConflictLogTuple:

3.
+ /* A valid tuple must be prepared and store in MyLogicalRepWorker. */

Typo still here: /store in/stored in/

~~~

4.
+static TupleDesc
+build_conflict_tupledesc(void)
+{
+ TupleDesc tupdesc;
+
+ tupdesc = CreateTemplateTupleDesc(N_LOCAL_CONFLICT_INFO_ATTRS);
+
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
+ XIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "commit_ts",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "origin",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "tuple",
+ JSONOID, -1, 0);

If you had some incrementing attno instead of hard-wiring the
(1,2,3,4) then you'd be able to add a sanity check like Assert(attno +
1 == N_LOCAL_CONFLICT_INFO_ATTRS); that can safeguard against future
mistakes in case something changes without updating the constant.

~~~

build_local_conflicts_json_array:

5.
+ /* Process local conflict tuple list and prepare a array of JSON. */
+ foreach(lc, conflicttuples)
{
- tableslot = table_slot_create(localrel, &estate->es_tupleTable);
- tableslot = ExecCopySlot(tableslot, slot);
+ ConflictTupleInfo *conflicttuple = (ConflictTupleInfo *) lfirst(lc);

5a.
typo in comment: /a array/an array/

~

5b.
SUGGESTION
foreach_ptr(ConflictTupleInfo, conflicttuple, confrlicttuples)
{

~~~

6.
+ i = 0;
+ foreach(lc, json_datums)
+ {
+ json_datum_array[i] = (Datum) lfirst(lc);
+ json_null_array[i] = false;
+ i++;
+ }

6a.
The loop seemed to be unnecessarily complicated since you already know
the size. Isn't it the same as below?

SUGGESTION
for (int i = 0; i < num_conflicts; i++)
{
json_datum_array[i] = (Datum) list_nth(json_datums, i);
json_null_array[i] = false;
}

6b.
Also, there is probably no need to do json_null_array[i] = false; at
every iteration here, because you could have just used palloc0 for the
whole array in the first place.

======
src/test/regress/expected/subscription.out

7.
+-- check if the table exists and has the correct schema (15 columns)
+SELECT count(*) FROM pg_attribute WHERE attrelid =
'public.regress_conflict_log1'::regclass AND attnum > 0;
+ count
+-------
+ 11
+(1 row)
+

That comment is wrong; there aren't 15 columns anymore.

~~~

8.
(mentioned in a previous review)

I felt that \dRs should display the CLT's schema name in the "Conflict
log table" field -- at least when it's not "public". Otherwise, it
won't be easy for the user to know it.

I did not see a test case for this.

~~~

9.
(mentioned in a previous review)

You could have another test case to explicitly call the function
pg_relation_is_publishable(clt) to verify it returns false for a CTL
table.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bilal Yavuz 2025-12-04 07:09:33 Re: [PATCH] Add `headerscheck` run_target to meson
Previous Message Brandon Tat 2025-12-04 07:01:31 Re: Improve error reporting in 027_stream_regress test