Re: Proposal: Conflict log history table for Logical Replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-22 06:48:32
Message-ID: CALDaNm01V3wxb-MhcEoY4WGK+5BDCTfud5fP66hCvdipQWGqjA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 20 Dec 2025 at 16:51, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, Dec 20, 2025 at 3:17 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Tue, 16 Dec 2025 at 09:54, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > I was considering the interdependence between the subscription and the
> > > > > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > > > > drop the CLT, the system would recognize the dependency of the
> > > > > > > subscription and prevent the drop unless the subscription is removed
> > > > > > > first or the CASCADE option is used.
> > > > > > >
> > > > > > > However, while investigating this, I encountered an error [1] stating
> > > > > > > that global objects are not supported in this context. This indicates
> > > > > > > that global objects cannot be made dependent on local objects.
> > > > > > >
> > > > > >
> > > > > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > > > > objects. For example, consider following case:
> > > > > > postgres=# create table t1(c1 int primary key);
> > > > > > CREATE TABLE
> > > > > > postgres=# \d+ t1
> > > > > > Table "public.t1"
> > > > > > Column | Type | Collation | Nullable | Default | Storage |
> > > > > > Compression | Stats target | Description
> > > > > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > > > > > c1 | integer | | not null | | plain |
> > > > > > | |
> > > > > > Indexes:
> > > > > > "t1_pkey" PRIMARY KEY, btree (c1)
> > > > > > Publications:
> > > > > > "pub1"
> > > > > > Not-null constraints:
> > > > > > "t1_c1_not_null" NOT NULL "c1"
> > > > > > Access method: heap
> > > > > > postgres=# drop index t1_pkey;
> > > > > > ERROR: cannot drop index t1_pkey because constraint t1_pkey on table
> > > > > > t1 requires it
> > > > > > HINT: You can drop constraint t1_pkey on table t1 instead.
> > > > > >
> > > > > > Here, the PK index is created as part for CREATE TABLE operation and
> > > > > > pk_index is not allowed to be dropped independently.
> > > > > >
> > > > > > > Although making an object dependent on global/shared objects is
> > > > > > > possible for certain types of shared objects [2], this is not our main
> > > > > > > objective.
> > > > > > >
> > > > > >
> > > > > > As per my understanding from the above example, we need something like
> > > > > > that only for shared object subscription and (internally created)
> > > > > > table.
> > > > >
> > > > > Yeah that seems to be exactly what we want, so I tried doing that by
> > > > > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > > > > it is behaving as we want[2]. And while dropping the subscription or
> > > > > altering CLT we can delete internal dependency so that CLT get dropped
> > > > > automatically[3]
> > > > >
> > > > > I will send an updated patch after testing a few more scenarios and
> > > > > fixing other pending issues.
> > > > >
> > > > > [1]
> > > > > + ObjectAddressSet(myself, RelationRelationId, relid);
> > > > > + ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > > > > + recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> > > > >
> > > > >
> > > > > [2]
> > > > > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > > > > ERROR: 2BP01: cannot drop table myschema.conflict_log_history2
> > > > > because subscription sub requires it
> > > > > HINT: You can drop subscription sub instead.
> > > > > LOCATION: findDependentObjects, dependency.c:788
> > > > > postgres[670778]=#
> > > > >
> > > > > [3]
> > > > > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > > > > performDeletion(&object, DROP_CASCADE
> > > > > PERFORM_DELETION_INTERNAL |
> > > > > PERFORM_DELETION_SKIP_ORIGINAL);
> > > > >
> > > > >
> > > >
> > > > Here is the patch which implements the dependency and fixes other
> > > > comments from Shveta.
> > >
> > > Thanks for the changes, the new implementation based on dependency
> > > creates a cycle while dumping:
> > > ./pg_dump -d postgres -f dump1.txt -p 5433
> > > pg_dump: warning: could not resolve dependency loop among these items:
> > > pg_dump: detail: TABLE conflict (ID 225 OID 16397)
> > > pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
> > > pg_dump: detail: POST-DATA BOUNDARY (ID 3491)
> > > pg_dump: detail: TABLE DATA t1 (ID 3485 OID 16384)
> > > pg_dump: detail: PRE-DATA BOUNDARY (ID 3490)
> > >
> > > This can be seen with a simple subscription with conflict_log_table.
> > > This was working fine with the v11 version patch.
> >
> > The attached v13 patch includes the fix for this issue. In addition,
> > it now raises an error when attempting to configure a conflict log
> > table that belongs to a temporary schema or is not a permanent
> > (persistent) relation.
>
> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes
> 2. Subscription option changed to conflict_log_destination=(log/table/all/'')
> 3. For internal processing we will use ConflictLogDest enum whereas
> for taking input or storing into catalog we will use string [1].
> 4. As suggested by Sawada San, if conflict_log_destination is 'table'
> we log the information about conflict but don't log the tuple
> details[3]
>
> Pending:
> 1. tap test for conflict insertion
> 2. Still need to work on caching related changes discussed at [2], so
> currently we don't allow conflict log tables to be added to
> publication at all and might change this behavior as discussed at [2]
> and for that we will need to implement the caching.
> 3. Need to add conflict insertion test and doc changes.
> 4. Still need to check on the latest comments from Peter Smith.
>
>
> [1]
> typedef enum ConflictLogDest
> {
> CONFLICT_LOG_DEST_INVALID = 0,
> CONFLICT_LOG_DEST_LOG, /* "log" (default) */
> CONFLICT_LOG_DEST_TABLE, /* "table" */
> CONFLICT_LOG_DEST_ALL /* "all" */
> } ConflictLogDest;

Consider the following scenario. Initially, the subscription was
configured with conflict_log_destination set to a table. As conflicts
occurred, entries were generated and recorded in that table, for
example:
postgres=# SELECT * FROM conflict_log_table_16399;
relid | schemaname | relname | conflict_type | remote_xid |
remote_commit_lsn | remote_commit_ts | remote_origin |
replica_identity | remote_tuple |
local_conflicts
-------+------------+---------+---------------+------------+-------------------+----------------------------------+---------------+------------------+--------------+-------------------------
-------------------------------------------------------------------------
16384 | public | t1 | insert_exists | 765 |
0/0178A718 | 2025-12-22 12:06:57.417789+05:30 | pg_16399 |
| {"c1":1} | {"{\"xid\":\"781\",\"com
mit_ts\":null,\"origin\":null,\"key\":{\"c1\":1},\"tuple\":{\"c1\":1}}"}
16384 | public | t1 | insert_exists | 765 |
0/0178A718 | 2025-12-22 12:06:57.417789+05:30 | pg_16399 |
| {"c1":1} | {"{\"xid\":\"781\",\"com
mit_ts\":null,\"origin\":null,\"key\":{\"c1\":1},\"tuple\":{\"c1\":1}}"}
(2 rows)

Subsequently, the conflict log destination was changed from table to log:
ALTER SUBSCRIPTION sub1 SET (conflict_log_destination = 'log');

As a result, the conflict log table is dropped, and there is no longer
any way to access the previously recorded conflict entries. This
effectively causes the loss of historical conflict data.

It is unclear whether this behavior is desirable or expected. Should
we consider a way to preserve the historical conflict data in this
case?

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-12-22 07:19:39 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Amit Kapila 2025-12-22 06:26:57 Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions