Re: Proposal: Conflict log history table for Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-12-01 09:41:56
Message-ID: CAA4eK1+tW8_LiTt1ZCGpH06fq4SpyUaduqtapAT1PUHVKBGrxg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 1, 2025 at 2:58 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > Since there is a concern that multiple rows for
> > > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > > that this is actually more prone to causing data-bloat if it is not
> > > resolved on time, as it seems a far more frequent scenario. So shall
> > > we keep inserting the record or insert it once and avoid inserting it
> > > again based on lsn? Thoughts?
> >
> > I agree, this is the real problem related to bloat so maybe we can see
> > if the same tuple exists we can avoid inserting it again, although I
> > haven't put thought on how to we distinguish between the new conflict
> > on the same row vs the same conflict being inserted multiple times due
> > to worker restart.
> >
>
> If there is consensus on this approach, IMO, it appears safe to rely
> on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
> the given 'conflict_type' before we insert a new record.
>

What happens if as part of multiple_unique_conflict, in the next apply
round only some of the rows conflict (say in the meantime user has
removed a few conflicting rows)? I think the ideal way for users to
avoid such multiple occurrences is to configure subscription with
disable_on_error. I think we should LOG errors again on retry and it
is better to keep it consistent with what we print in LOG because we
may want to give an option to users in future where to LOG (in
conflict_history_table, LOG, or both) the conflicts.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-12-01 10:06:36 Re: Migrate to autoconf 2.72?
Previous Message shveta malik 2025-12-01 09:27:53 Re: Proposal: Conflict log history table for Logical Replication