Re: Proposal: Conflict log history table for Logical Replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-01 08:34:08
Message-ID: CAFiTN-s3ardLbL9jQ=tXkuwY+-1GJNPLwh=n8knHx=rRvcOqvA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts. IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> >
> > This function is used while publishing every single change and I don't
> > think we want to add a cost to check each subscription to identify
> > whether the table is listed as CLT.
> >
> > > > 2)
> > > > '\d+ clt' shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> > I think we should fix this.
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
> >
> > > > create subscription sub1 connection '...' publication pub1
> > > > WITH(conflict_log_table='clt');
> > > > create publication pub3 for table clt;
> > > >
> > > > 4)
> > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > will also get clt as non-publishable always (and will solve a few of
> > > > the above issues I think). IIUC, there is no place where we want to
> > > > mark CLT as publishable or is there any?
> >
> > IMHO the main reason is performance.
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> >
> > > >
> > > > Few observations related to conflict-logging:
> > > > ------------------------------
> > > > 1)
> > > > I found that for the conflicts which ultimately result in Error, we do
> > > > not insert any conflict-record in clt.
> > > >
> > > > a)
> > > > Example: insert_exists, update_Exists
> > > > create table tab1 (i int primary key, j int);
> > > > sub: insert into tab1 values(30,10);
> > > > pub: insert into tab1 values(30,10);
> > > > ERROR: conflict detected on relation "public.tab1": conflict=insert_exists
> > > > No record in clt.
> > > >
> > > > sub:
> > > > <some pre-data needed>
> > > > update tab1 set i=40 where i = 30;
> > > > pub: update tab1 set i=40 where i = 20;
> > > > ERROR: conflict detected on relation "public.tab1": conflict=update_exists
> > > > No record in clt.
> >
> > Yeah that interesting need to put thought on how to commit this record
> > when an outer transaction is aborted as we do not have autonomous
> > transactions which are generally used for this kind of logging. But
> > we can explore more options like inserting into conflict log tables
> > outside the outer transaction.
> >
> > > > b)
> > > > Another question related to this is, since these conflicts (which
> > > > results in error) keep on happening until user resolves these or skips
> > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > time in clt as well. Thoughts?
> >
> > I think it make sense to insert every time we see the conflict, but it
> > would be good to have opinion from others as well.
>
> Since there is a concern that multiple rows for
> multiple_unique_conflicts can cause data-bloat, it made me rethink
> that this is actually more prone to causing data-bloat if it is not
> resolved on time, as it seems a far more frequent scenario. So shall
> we keep inserting the record or insert it once and avoid inserting it
> again based on lsn? Thoughts?

I agree, this is the real problem related to bloat so maybe we can see
if the same tuple exists we can avoid inserting it again, although I
haven't put thought on how to we distinguish between the new conflict
on the same row vs the same conflict being inserted multiple times due
to worker restart.

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-12-01 08:41:20 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message shveta malik 2025-12-01 08:27:40 Re: Proposal: Conflict log history table for Logical Replication