| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com> |
| Subject: | Re: Proposal: Conflict log history table for Logical Replication |
| Date: | 2026-06-03 09:54:52 |
| Message-ID: | CABdArM7+rHeod_Rt2AFVV-CmgHK0uOg7tsonY4GPxq9=9ZwX4w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 3, 2026 at 10:11 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, 2 Jun 2026 at 10:51, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > PFA, latest version of patch
> >
> > Changes:
> > 1. Includes Nisha's suggestion from v43_comments_nisha.patch
> > 2. Include Amit's suggestion from v43-0001_amit.1.patch
> > 3. Include Amit's suggestion: change GetLogDestination to
> > GetConflictLogDest(), drop_sub_conflict_log_table() to
> > drop_sub_conflict_log_table(), CONFLICTS_LOGGED_TO_FILE to
> > CONFLICTS_LOGGED_TO_LOG and renamed IsConflictNamespace() to
> > IsConflictLogTableNamespace()
> > 4. Include Vignesh's fix Parallel_apply_failure.patch
> > 5. Change conflict log table name back to conflict_log_table_<subid>
> > 6. Changed acl to exclude INSERT/UPDATE/USAGE as it was there in initial patch
> > 7. Merged 0002 to 0001 and 0004 to 0003 after review
> >
> > Open
> > 1. Vignesh please rebase upgrade and \dRs+ page
>
> Here is the rebased version of the upgrade and \dRs patch which is
> present in v45-0003 and v45-0004. There is no change in v45-0001 and
> v45-0002, they are the same patch as in [1].
Thanks for the updated patches, please find my comments on the v44
patches below. These appear to apply to v45-001 and v45-002 as well,
as both are unchanged.
1) The patches require a catversion bump. It may be worth adding it
now so it does not get missed later during commit.
~~~
2) As ProcessPendingConflictLogTuple() may perform a heap_insert, it
could fail for various reasons, such as table access errors, write
failures, WAL write failures, or ownership-related issues.
I looked into the error handling when CLT insert fails:
2a) When disable_on_error = false
/* Try to allocate a worker for the streaming transaction. */
@@ -5666,6 +5674,7 @@ start_apply(XLogRecPtr origin_startpos)
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid);
+ ProcessPendingConflictLogTuple();
PG_RE_THROW();
}
If ProcessPendingConflictLogTuple() fails, PG_RE_THROW() is never
reached, so the original conflict error is lost and only the secondary
CLT-related error is reported, e.g.:
ERROR: could not open relation with OID 16421
LOG: background worker "logical replication apply worker" exited
with exit code 1
If the failure persists, the apply worker will keep retrying and users
may never see the actual conflict error. The same concern applies to
the parallel worker case too.
Is there a clean way to ensure the original apply/conflict error is
still logged even if ProcessPendingConflictLogTuple() itself fails?
~~
2b) When disable_on_error = true i.e. DisableSubscriptionAndExit() path
@@ -6040,6 +6049,8 @@ DisableSubscriptionAndExit(void)
*/
pgstat_report_subscription_error(MyLogicalRepWorker->subid);
+ ProcessPendingConflictLogTuple();
+
/* Disable the subscription */
StartTransactionCommand();
This introduces a potential failure point before the subscription is
disabled. If ProcessPendingConflictLogTuple() fails, the subscription
may never be disabled and the worker could repeatedly hit the same
error.
Would it be safer to move ProcessPendingConflictLogTuple() until after
the subscription has been disabled, so that disabling the subscription
is not dependent on successful CLT processing?
~~
2c) When "conflict_log_destination = table" the log reports:
ERROR: conflict detected on relation "public.t1": conflict=insert_exists
DETAIL: Conflict details are logged to the conflict log table:
pg_conflict_log_16403
However, the subsequent CLT write can still fail:
...
ERROR: could not open relation with OID 16426
LOG: background worker "logical replication apply worker" exited
with exit code 1
Since the CLT write has not yet occurred when this message is emitted,
would it make sense to change it to:
"Conflict details will be logged to the conflict log table: ..."
to avoid implying that the logging has already succeeded?
~~~
3) The v45-002 commit message still refers to the old conflict log
table name, pg_conflict.pg_conflict_log_for_subid_16396. It should be
updated to pg_conflict_log_16396.
~~~
4) 035_conflicts.pl: I see that v45-002 has corrected the CLT name in
below test -
+# Verify the contents of the Conflict Log Table (CLT)
+# This section ensures that the CLT contains the expected
+# type and specific key data.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'sub_tab';");
+my $clt = "pg_conflict.pg_conflict_log_$subid";
+
+# Wait for the conflict to be logged in the CLT
+my $log_check = $node_subscriber->poll_query_until(
+ 'postgres',
+ "SELECT count(*) > 0 FROM $clt;"
+);
I wonder why this wasn't caught in the v44 tests. It seems the check
below is silently broken because poll_query_until() keeps retrying
until timeout when the table does not exist.
Should we replace it with an ok() test instead? something like -
# Wait for the conflict to be logged in the CLT
ok($node_subscriber->poll_query_until(
'postgres',
"SELECT count(*) > 0 FROM $clt;"),
'conflict appeared in CLT after insert');
~~~
--
Thanks,
Nisha
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-06-03 10:07:40 | Re: hashjoins vs. Bloom filters (yet again) |
| Previous Message | Oleg Bartunov | 2026-06-03 09:20:35 | Re: hashjoins vs. Bloom filters (yet again) |