Re: Proposal: Conflict log history table for Logical Replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2026-07-03 03:53:49
Message-ID: CAFiTN-sz2XroMyuKpK6YrB9z+F3bRdS+wN9VxMQjr9=7o3BfDQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2026 at 9:09 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Jul 2, 2026 at 5:30 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 2, 2026 at 4:48 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > On Thu, 2 Jul 2026 at 13:39, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jul 1, 2026 at 6:55 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jul 1, 2026 at 1:16 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, Jul 1, 2026 at 12:45 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > Thanks I have fixed these comments and also merged the docs.
> > > > > >
> > > > > > Thanks. Sharing the version with pgindent run on v61.
> > > > >
> > > > > Here is the updated patch version. It fixes offlist review comments
> > > > > provided by Amit and Shveta, bumps the catversion and adds authors and
> > > > > reviewers details.
> > > >
> > > > I rebased the remaining patches on top of HEAD. So far, I have run
> > > > pgindent and completed the doc merge for 0001. The 0002 and 0003 are
> > > > just rebased.
> > >
> > > Hi Dilip,
> > >
> > > While testing I hit the following assert:
> > > + if (OidIsValid(relid))
> > > + {
> > > + /* Existing table from upgrade */
> > > + Assert(IsBinaryUpgrade);
> > > + }
> > >
> > > Steps to reproduce:
> > > 1. Suppose initially we have a logical replication setup and the
> > > subscriber is created with 'conflict_log_destination = log'
> > > 2. Now we have two psql sessions of the subscriber node. Attach GDB to
> > > first session and add breakpoint to line:
> > > AlterSubscription->LockSharedObject(SubscriptionRelationId, subid, 0,
> > > AccessExclusiveLock);
> > > 3.Execute 'ALTER SUBSCRIPTION sub1 SET (conflict_log_destination =
> > > 'all')' on the first session. It will stop at the breakpoint.
> > > 4. Now execute 'ALTER SUBSCRIPTION sub1 SET (conflict_log_destination
> > > = 'all');' on the second session. The execution on the second session
> > > is successful and a conflict log table is created.
> > > 5. Continue the execution of the first session. This will hit the
> > > above Assert condition.
> > >
> > > I think this happens because for the first session, we have the stale
> > > copy of 'sub'.
> > > While the first session is stopped at the breakpoint, the second
> > > session successfully executes ALTER SUBSCRIPTION and creates the
> > > conflict log table. When the first session resumes, it still sees the
> > > old value of old_dest (i.e CONFLICT_LOG_DEST_LOG), so
> > > 'opts.conflictlogdest != old_dest' evaluates to true and
> > > 'alter_sub_conflict_log_dest()' is invoked.
> > > Since the second session has already created the conflict log table
> > > and 'want_table' is true, the code finds an existing conflict log
> > > table '(OidIsValid(relid))', causing the assert to fire.
> >
> > In short, if I understand correctly, the issue is that altering a
> > subscription that tries to create the table might hit this assert
> > because we assume that if we are switching from 'log' to 'table' or
> > 'all', there should not be an existing table except in binary upgrade
> > mode. This happens because we read the destination information from
> > subscription cache while checking the table status in real time. In
> > fact in other cases, it might cause alter subscription to error out if
> > the table is created while we are creating the table. Therefore, we
> > need to handle concurrent table creation during ALTER SUBSCRIPTION.
> > One option is: if the table exists and it's not a binary upgrade,
> > should we just error out?
> >
>
> No, I feel it should be Assert only. IMO, we should re-fetch 'sub'
> beofre CLT creation or move first 'sub' fetching itself post
> LockSharedObject if there is nothing blocking it. Thoughts?

Yeah, IMHO once we fix [1] this issue should be automatically resolved.

[1] https://www.postgresql.org/message-id/akZUpiDa1UfmzYxL%40bdtpg

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-07-03 03:54:31 Re: BUG: ReadStream look-ahead exhausts local buffers when effective_io_concurrency>=64
Previous Message Mario González Troncoso 2026-07-03 03:47:58 Possible replace of strncpy on xactdesc.c