Re: Proposal: Conflict log history table for Logical Replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2026-06-23 08:52:41
Message-ID: CALDaNm0siWVCuTg6S8Xbk5SMHc91AX8Y+fAtOyV+cjwcpiFDWQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 22 Jun 2026 at 20:52, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Jun 22, 2026 at 4:32 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Sun, Jun 21, 2026 at 7:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jun 18, 2026 at 9:33 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Jun 16, 2026 at 6:54 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > IMHO we should just log WARNING and continue the apply worker on
> > > > > conflict insertion failure, lets see what other thinks on this.
> > > > >
> > > >
> > > > I have the same opinion. Allowing CLT to block the apply worker would
> > > > be undesirable; CLT is a history/logs collection feature and should
> > > > not interrupt core logical replication work.
> > > >
> > >
> > > I think the insert can fail in rare cases like disk getting full while
> > > writing WAL or some internal memory ERROR and the ERROR could be
> > > persistent which means the LOG will be filled with the same WARNING if
> > > there are many conflicts. Also, users may not like missing out on
> > > conflict information. So, we can ERROR out and let users fix the
> > > situation. Additionally, the nested try-catch to downgrade ERROR to
> > > WARNING also looks ugly and a source of future bugs and maintenance
> > > burden. The attached patch tries to fix this by ERRORing out on
> > > insertion failure and attaching the required conflict info as a
> > > context of ERROR. The patch also improved the ReportApplyConflict()
> > > non-ERROR paths by displaying the conflict information in server LOGs
> > > before inserting the same into CLT so that if insertion fails, the
> > > complete conflict info can be present in server LOGs. See
> > > v52-1-amit.Improve-error-handling-for-conflict-log-table-ins.
> > >
> > > Additionally, there is another problem with 0003 where when a parallel
> > > apply worker hits an ERROR-level conflict it logs the conflict to the
> > > conflict log table in a new transaction in its error path, after
> > > aborting the failed apply transaction. But the leader detects worker
> > > failure in pa_wait_for_xact_finish() by waiting on the worker's
> > > transaction lock, and AbortOutOfAnyTransaction() releases that lock:
> > > the leader unblocks, sees a non-finished state, raises "lost
> > > connection to the logical replication parallel apply worker", and
> > > tears the worker down -- which can SIGTERM it mid-insert and lose the
> > > conflict log row, besides being a misleading message. The attached
> > > top-up patch v52-2-amit.fix_parallel_apply_logging fixes that by
> > > introducing PARALLEL_TRANS_ERROR state.
> >
> > I reproduced the above issue and verified the fix for it in
> > v52-2-amit.fix_parallel_apply_logging. Here is a TAP test for the
> > same.
> > The attached top-up patch applies on top of the latest v53-0005 patch.
> >
> I have merged Amit's patch and Nisha's tap test patch and tested all
> the cases discussed, and those are working fine.

Few comments:
1) Currently we are storing these in shared memory. Looking at the
implementation, these fields are purely worker-private state used to
ferry data across the error boundary from prepare_conflict_log_tuple()
(inside the PG_TRY block) to ProcessPendingConflictLogTuple() (inside
the PG_CATCH block).If it is not required by another process, should
it be moved out of shared memory.
+ /* A conflict log tuple that is prepared but not yet inserted. */
+ HeapTuple conflict_log_tuple;
+
+ /*
+ * Error-context string describing the conflict above, used to
annotate any
+ * error raised while inserting conflict_log_tuple into the conflict log
+ * table. Allocated, like conflict_log_tuple, in ApplyContext.
+ */
+ char *conflict_log_errcontext;

2) Should conflict_log_errcontext be initialized in
logicalrep_worker_launch like other members:
+++ b/src/include/replication/worker_internal.h
@@ -100,6 +100,16 @@ typedef struct LogicalRepWorker
*/
TransactionId oldest_nonremovable_xid;

+ /* A conflict log tuple that is prepared but not yet inserted. */
+ HeapTuple conflict_log_tuple;
+
+ /*
+ * Error-context string describing the conflict above, used to
annotate any
+ * error raised while inserting conflict_log_tuple into the conflict log
+ * table. Allocated, like conflict_log_tuple, in ApplyContext.
+ */
+ char *conflict_log_errcontext;

3) Is the usage of heap_insert intentional here, shouldn't it be using
generic table access method API table_tuple_insert?
+InsertConflictLogTuple(Relation conflictlogrel)
+{
+ ErrorContextCallback errcallback;
+
+ /* A valid tuple must be prepared and stored in MyLogicalRepWorker. */
+ Assert(MyLogicalRepWorker->conflict_log_tuple != NULL);
+
+ /*
+ * Set up an error context so that a failure to insert (e.g.
the conflict
+ * log table was dropped, or an out-of-space error) carries information
+ * identifying the conflict we were trying to log.
+ */
+ errcallback.callback = conflict_log_insert_errcontext;
+ errcallback.arg = MyLogicalRepWorker->conflict_log_errcontext;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ heap_insert(conflictlogrel, MyLogicalRepWorker->conflict_log_tuple,
+ GetCurrentCommandId(true),
HEAP_INSERT_NO_LOGICAL, NULL);

4) Is the condition remote_commit_ts > 0 done intentionally?
+ if (remote_commit_ts > 0)
+ values[attno++] = TimestampTzGetDatum(remote_commit_ts);
+ else
+ nulls[attno++] = true;

As I had seen some negative values for certain timestamps. Shouldn't
the check be != 0?
SELECT extract(epoch FROM '1969-12-31 23:59:59+00'::timestamptz);
extract
-----------
-1.000000
(1 row)

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Akshay Joshi 2026-06-23 08:57:54 Re: [PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements
Previous Message Amit Kapila 2026-06-23 08:52:29 Re: Add a hook for handling logical decoding messages on subscribers.