| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Proposal: Conflict log history table for Logical Replication |
| Date: | 2026-06-22 08:26:19 |
| Message-ID: | CAJpy0uADR=MkBZAswgNTow6DZQ+AQ-PPcSKCgbCGfF=HYdgxWw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jun 22, 2026 at 9:33 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 22 Jun 2026 at 08:41, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sun, Jun 21, 2026 at 7:53 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > While attempting to log a conflict, a concurrent ALTER SUBSCRIPTION
> > > can change the conflict logging destination from all to log. In this
> > > scenario, the apply worker may already have cached the conflictlogdest
> > > information, including the OID of the current conflict log table.
> > > However, the concurrent ALTER SUBSCRIPTION drops the conflict log
> > > table as part of the destination change:
> > > +Relation
> > > +GetConflictLogDestAndTable(ConflictLogDest *log_dest)
> > > +{
> > > + Oid conflictlogrelid;
> > > +
> > > + /*
> > > + * Convert the text log destination to the internal enum.
> > > MySubscription
> > > + * already contains the data from pg_subscription.
> > > + */
> > > + *log_dest = GetConflictLogDest(MySubscription->conflictlogdest);
> > > +
> > > + /* Quick exit if a conflict log table was not requested. */
> > > + if (!CONFLICTS_LOGGED_TO_TABLE(*log_dest))
> > > + return NULL;
> > > +
> > > + conflictlogrelid = MySubscription->conflictlogrelid;
> > > +
> > > + Assert(OidIsValid(conflictlogrelid));
> > > +
> > > + return table_open(conflictlogrelid, RowExclusiveLock);
> > > +}
> > >
> > > As a result, when the apply worker later attempts to open the cached
> > > conflict log table, table_open() fails because the relation has
> > > already been dropped. This causes the error handling path itself to
> > > fail before the conflict record can be written to either the conflict
> > > log table or the server log.
> > >
> > > In such cases, the conflict record is effectively lost and is not
> > > logged anywhere. For example:
> > > 2026-06-21 19:31:13.592 IST [263598] LOG: logical replication apply
> > > worker for subscription "sub1" has started
> > > 2026-06-21 19:32:26.731 IST [263598] ERROR: could not open relation
> > > with OID 16405
> > > 2026-06-21 19:32:26.731 IST [263598] CONTEXT: processing remote data
> > > for replication origin "pg_16404" during message type "INSERT" for
> > > replication target relation "public.t1" in transaction 698, finished
> > > at 0/017D39A0
> > > 2026-06-21 19:32:26.735 IST [263471] LOG: background worker "logical
> > > replication apply worker" (PID 263598) exited with exit code 1
> > >
> > > Ideally, failure to access the conflict log table should not prevent
> > > the conflict from being reported in the server log. This issue is
> > > present with the v52 version. I have not yet checked if Amit's recent
> > > patch posted a few minutes ago at [1] handles this issue.
> > >
> >
> > There are two places in the patch from where we LOG/Insert the
> > conflict data. First is ReportApplyConflict() where we LOG if the
> > conflict arises from a non-ERROR path (aka conflicts other
> > INSERT/UPDATE_EXISTS). In that case, the conflict data will be logged
> > even when we fail to insert into CLT. Second is the place for
> > conflicts that arose as ERRORs (aka INSERT/UPDATE_EXISTS), where the
> > conflict information will be logged along with insert failure as
> > CONTEXT. Can you please verify your test based on this input and share
> > your findings and thoughts?
>
> The scenario I am testing is an insert_exists conflict.
> On the publisher:
> CREATE TABLE t1 (c1 int);
>
> On the subscriber:
> CREATE TABLE t1 (c1 int PRIMARY KEY);
>
> Then execute the following on the publisher:
> INSERT INTO t1 VALUES (10);
> INSERT INTO t1 VALUES (10);
>
> The second insert generates an insert_exists conflict on the
> subscriber. The conflict is reported and logged through the following
> call chain:
> apply_handle_insert
> -> apply_handle_insert_internal
> -> ExecSimpleRelationInsert
> -> CheckAndReportConflict
> -> ReportApplyConflict
>
> Pause execution in ReportApplyConflict() at
> GetConflictLogDestAndTable(), immediately before opening the conflict
> log table:
> ...
> return table_open(conflictlogrelid, RowExclusiveLock);
> ...
>
> While the apply worker is paused, execute the following command concurrently:
> ALTER SUBSCRIPTION sub1
> SET (conflict_log_destination = 'log');
>
> This succeeds and drops the conflict log table:
> NOTICE: dropped conflict log table "pg_conflict.pg_conflict_log_16404"
> for subscription "sub1"
> ALTER SUBSCRIPTION
>
> At this point, GetConflictLogDestAndTable() has already determined
> that the conflict should be logged to a table and has cached the
> corresponding relation OID. However, the concurrent ALTER SUBSCRIPTION
> has removed that table.
>
> When execution resumes, the subsequent table_open() call fails with:
> 2026-06-22 09:24:53.072 IST [304864] ERROR: could not open relation
> with OID 16405
>
> As a result, conflict processing itself fails before the conflict
> details can be recorded. The conflict is therefore not logged to the
> conflict log table and is also not emitted to the server log.
>
I understand this case, but I feel it isn't critical because the table
is going to be dropped in parallel, so ultimately, all data is lost.
At-max, we can provide a LOG when table-open fails, indicating that
the CLT table is dropped concurrently and thus conflict-data cannott
be logged to table.
thanks
Shveta
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-06-22 08:28:26 | enum_in cache lookup failed for type |
| Previous Message | Nazir Bilal Yavuz | 2026-06-22 08:17:44 | Re: Fix handling of copy_file_range() return value |