Re: Proposal: Conflict log history table for Logical Replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-01 09:27:53
Message-ID: CAJpy0uADo1SX2p8Ybs_wmSrOH4BL1GvodibwA1sbpZow4rt9pA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > > Few observations related to publication.
> > > > > ------------------------------
> > >
> > > Thanks Shveta, for testing and sharing your thoughts. IMHO for
> > > conflict log tables it should be good enough if we restrict it when
> > > ALL TABLE options are used, I don't think we need to put extra effort
> > > to completely restrict it even if users want to explicitly list it
> > > into the publication.
> > >
> > > > >
> > > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > > >
> > > > > 1)
> > > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> > >
> > > This function is used while publishing every single change and I don't
> > > think we want to add a cost to check each subscription to identify
> > > whether the table is listed as CLT.
> > >
> > > > > 2)
> > > > > '\d+ clt' shows all-tables publication name. I feel we should not
> > > > > show that for clt.
> > >
> > > I think we should fix this.
> > >
> > > > > 3)
> > > > > I am able to create a publication for clt table, should it be allowed?
> > >
> > > I believe we should not do any specific handling to restrict this but
> > > I am open for the opinions.
> > >
> > > > > create subscription sub1 connection '...' publication pub1
> > > > > WITH(conflict_log_table='clt');
> > > > > create publication pub3 for table clt;
> > > > >
> > > > > 4)
> > > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > > will also get clt as non-publishable always (and will solve a few of
> > > > > the above issues I think). IIUC, there is no place where we want to
> > > > > mark CLT as publishable or is there any?
> > >
> > > IMHO the main reason is performance.
> > >
> > > > > 5) Also, I feel we can add some documentation now to help others to
> > > > > understand/review the patch better without going through the long
> > > > > thread.
> > >
> > > Make sense, I will do that in the next version.
> > >
> > > > >
> > > > > Few observations related to conflict-logging:
> > > > > ------------------------------
> > > > > 1)
> > > > > I found that for the conflicts which ultimately result in Error, we do
> > > > > not insert any conflict-record in clt.
> > > > >
> > > > > a)
> > > > > Example: insert_exists, update_Exists
> > > > > create table tab1 (i int primary key, j int);
> > > > > sub: insert into tab1 values(30,10);
> > > > > pub: insert into tab1 values(30,10);
> > > > > ERROR: conflict detected on relation "public.tab1": conflict=insert_exists
> > > > > No record in clt.
> > > > >
> > > > > sub:
> > > > > <some pre-data needed>
> > > > > update tab1 set i=40 where i = 30;
> > > > > pub: update tab1 set i=40 where i = 20;
> > > > > ERROR: conflict detected on relation "public.tab1": conflict=update_exists
> > > > > No record in clt.
> > >
> > > Yeah that interesting need to put thought on how to commit this record
> > > when an outer transaction is aborted as we do not have autonomous
> > > transactions which are generally used for this kind of logging. But
> > > we can explore more options like inserting into conflict log tables
> > > outside the outer transaction.
> > >
> > > > > b)
> > > > > Another question related to this is, since these conflicts (which
> > > > > results in error) keep on happening until user resolves these or skips
> > > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > > time in clt as well. Thoughts?
> > >
> > > I think it make sense to insert every time we see the conflict, but it
> > > would be good to have opinion from others as well.
> >
> > Since there is a concern that multiple rows for
> > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > that this is actually more prone to causing data-bloat if it is not
> > resolved on time, as it seems a far more frequent scenario. So shall
> > we keep inserting the record or insert it once and avoid inserting it
> > again based on lsn? Thoughts?
>
> I agree, this is the real problem related to bloat so maybe we can see
> if the same tuple exists we can avoid inserting it again, although I
> haven't put thought on how to we distinguish between the new conflict
> on the same row vs the same conflict being inserted multiple times due
> to worker restart.
>

If there is consensus on this approach, IMO, it appears safe to rely
on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
the given 'conflict_type' before we insert a new record.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-12-01 09:41:56 Re: Proposal: Conflict log history table for Logical Replication
Previous Message li carol 2025-12-01 09:24:11 回复: UPDATE run check constraints for affected columns only