Re: Proposal: Conflict log history table for Logical Replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-01 04:41:08
Message-ID: CAFiTN-sVK6Bp+BawCJU_WpAXQSTX4OkKmce5EE4YNBgD-XSjZw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 28, 2025 at 5:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 18, 2025 at 3:40 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > 3)
> > > > We also need to think how we are going to display the info in case of
> > > > multiple_unique_conflicts as there could be multiple local and remote
> > > > tuples conflicting for one single operation. Example:
> > > >
> > > > create table conf_tab (a int primary key, b int unique, c int unique);
> > > >
> > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > > >
> > > > pub: insert into conf_tab values (2,3,4);
> > > >
> > > > ERROR: conflict detected on relation "public.conf_tab":
> > > > conflict=multiple_unique_conflicts
> > > > DETAIL: Key already exists in unique index "conf_tab_pkey", modified
> > > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > > CONTEXT: processing remote data for replication origin "pg_16392"
> > > > during message type "INSERT" for replication target relation
> > > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > > >
> > > > Currently in clt, we have singular terms such as 'key_tuple',
> > > > 'local_tuple', 'remote_tuple'. Shall we have multiple rows inserted?
> > > > But it does not look reasonable to have multiple rows inserted for a
> > > > single conflict raised. I will think more about this.
> > >
> > > Currently I am inserting multiple records in the conflict history
> > > table, the same as each tuple is logged, but couldn't find any better
> > > way for this.
> > >
>
> The biggest drawback of this approach is data bloat. The incoming data
> row will be stored multiple times.
>
> > > Another option is to use an array of tuples instead of a
> > > single tuple but not sure this might make things more complicated to
> > > process by any external tool.
> >
> > It’s arguable and hard to say what the correct behaviour should be.
> > I’m slightly leaning toward having a single row per conflict.
> >
>
> Yeah, it is better to either have a single row per conflict or have
> two tables conflict_history and conflict_history_details to avoid data
> bloat as pointed above. For example, two-table approach could be:
>
> 1. The Header Table (Incoming Data)
> This stores the data that tried to be applied.
> SQL
> CREATE TABLE conflict_header (
> conflict_id SERIAL PRIMARY KEY,
> source_tx_id VARCHAR(100), -- Transaction ID from source
> table_name VARCHAR(100),
> operation CHAR(1), -- 'I' for Insert
> incoming_data JSONB, -- Store the incoming row as JSON
> ...
> );
>
> 2. The Detail Table (Existing Conflicting Data)
> This stores the actual rows currently in the database that caused the
> violations.
> CREATE TABLE conflict_details (
> detail_id SERIAL PRIMARY KEY,
> conflict_id INT REFERENCES conflict_header(conflict_id),
> constraint_name/key_tuple VARCHAR(100),
> conflicting_row_data JSONB -- The existing row in the DB
> that blocked the insert
> );
>
> Please don't consider these exact columns; you can use something on
> the lines of what is proposed in the patch. This is just to show how
> the conflict data can be rearranged. Now, one argument against this is
> that users need to use JOIN to query data but still better than
> bloating the table. The idea to store in a single table could be
> changed to have columns like violated_constraints TEXT[], --
> e.g., ['uk_email', 'uk_phone'], error_details JSONB -- e.g.,
> [{"const": "uk_email", "val": "a(at)b(dot)com"}, ...]. If we want to store
> multiple conflicting tuples in a single column, we need to ensure it
> is queryable via a JSONB column. The point in favour of a single JSONB
> column to combine multiple conflicting tuples is that we need this
> combination only for one kind of conflict.
>
> Both the approaches have their pros and cons. I feel we should dig a
> bit deeper for both by laying out details for each method and see what
> others think.

The specific scenario we are discussing is when a single row from the
publisher attempts to apply an operation that causes a conflict across
multiple unique keys, with each of those unique key violations
conflicting with a different local row on the subscriber, is very
rare. IMHO this low-frequency scenario does not justify
overcomplicating the design with an array field or a multi-level
table.

Consider the infrequency of the root causes:
- How often does a table have more than 3 to 4 unique keys?
- How frequently would each of these keys conflict with a unique row
on the subscriber side?

If resolving this occasional, synthetic conflict requires inserting
two or three rows instead of a single one, this is an acceptable
trade-off considering how rare it can occur. Anyway this is my
opinion and I am open to opinions from others.

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message VASUKI M 2025-12-01 04:57:38 Re: BUG #19095: Test if function exit() is used fail when linked static
Previous Message Xuneng Zhou 2025-12-01 04:33:27 Re: Implement waiting for wal lsn replay: reloaded