Re: Proposal: Conflict log history table for Logical Replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-11-12 09:44:06
Message-ID: CAJpy0uAu2paxGAEffD=vaBTW9Jqbtxxawb8K8FgiASfeKPnGog@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
>
> > I agree that marking tables with a flag to easily exclude them during
> > publishing would be cleaner. In the current patch, for an ALL-TABLES
> > publication, we scan pg_subscription for each table in pg_class to
> > check its subconflicttable and decide whether to ignore it. But since
> > this only happens during create/alter subscription and refresh
> > publication, the overhead should be acceptable.
>
> Thanks for your opinion.
>
> > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> > enhancement but since we already have the EXCEPT list built in a
> > separate thread, that might be sufficient for now. IMO, such
> > conflict-tables should be marked internally (for example, with a
> > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> > identified within the system, without requiring users to explicitly
> > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> > see what others think on this.
> > For the time being, the current implementation looks fine, considering
> > it runs only during a few publication-related DDL operations.
>
> +1
>
> Here is the rebased patch, changes apart from rebasing it
> 1) Dropped the conflict history table during drop subscription
> 2) Added test cases for testing the conflict history table behavior
> with CREATE/ALTER/DROP subscription

Thanks.

> TODO:
> 1) Need more thoughts on the table schema whether we need to capture
> more items or shall we drop some fields if we think those are not
> necessary.

Yes, this needs some more thoughts. I will review.

I feel since design is somewhat agreed upon, we may handle
code-correction/completion. I have not looked at the rebased patch
yet, but here are a few comments based on old-version.

Few observations related to publication.
------------------------------

(In the below comments, clt/CLT implies Conflict Log Table)

1)
'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

2)
'\d+ clt' shows all-tables publication name. I feel we should not
show that for clt.

3)
I am able to create a publication for clt table, should it be allowed?

create subscription sub1 connection '...' publication pub1
WITH(conflict_log_table='clt');
create publication pub3 for table clt;

4)
Is there a reason we have not made '!IsConflictHistoryRelid' check as
part of is_publishable_class() itself? If we do so, other code-logics
will also get clt as non-publishable always (and will solve a few of
the above issues I think). IIUC, there is no place where we want to
mark CLT as publishable or is there any?

5) Also, I feel we can add some documentation now to help others to
understand/review the patch better without going through the long
thread.

Few observations related to conflict-logging:
------------------------------
1)
I found that for the conflicts which ultimately result in Error, we do
not insert any conflict-record in clt.

a)
Example: insert_exists, update_Exists
create table tab1 (i int primary key, j int);
sub: insert into tab1 values(30,10);
pub: insert into tab1 values(30,10);
ERROR: conflict detected on relation "public.tab1": conflict=insert_exists
No record in clt.

sub:
<some pre-data needed>
update tab1 set i=40 where i = 30;
pub: update tab1 set i=40 where i = 20;
ERROR: conflict detected on relation "public.tab1": conflict=update_exists
No record in clt.

b)
Another question related to this is, since these conflicts (which
results in error) keep on happening until user resolves these or skips
these or 'disable_on_error' is set. Then are we going to insert these
multiple times? We do count these in 'confl_insert_exists' and
'confl_update_exists' everytime, so it makes sense to log those each
time in clt as well. Thoughts?

2)
Conflicts where row on sub is missing, local_ts incorrectly inserted.
It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
indicating that it is not applicable for this conflict-type?

Example: delete_missing, update_missing
pub:
insert into tab1 values(10,10);
insert into tab1 values(20,10);
sub: delete from tab1 where i=10;
pub: delete from tab1 where i=10;

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-11-12 10:06:19 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Yugo Nagata 2025-11-12 09:34:17 Re: Suggestion to add --continue-client-on-abort option to pgbench