Re: Proposal: Conflict log history table for Logical Replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Proposal: Conflict log history table for Logical Replication
Date: 2025-11-12 06:50:55
Message-ID: CAJpy0uDKbYWt+YPADj=4fHEvrGEWgnG1n_YsiGT_EZiZf0VSAw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > > [1]
> > > > /*
> > > > * For logical decode we need combo CIDs to properly decode the
> > > > * catalog
> > > > */
> > > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > > log_heap_new_cid(relation, &tp);
> > > >
> > >
> > > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > > in initialize_data_directory() during initdb, basically we will create
> > > this type in template1 so that it will be available in all the
> > > databases, and that would simplify the table creation whether we
> > > create internally or we allow user to create it. And while checking
> > > is_publishable_class we can check the type and avoid publishing those
> > > tables.
> > >
> >
> > Based on my off list discussion with Amit, one option could be to set
> > HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> > history table, for that we can not use SPI interface to insert instead
> > we will have to directly call the heap_insert() to add this option.
> > Since we do not want to create any trigger etc on this table, direct
> > insert should be fine, but if we plan to create this table as
> > partitioned table in future then direct heap insert might not work.
>
> Upon further reflection, I realized that while this approach avoids
> streaming inserts to the conflict log history table, it still requires
> that table to exist on the subscriber node upon subscription creation,
> which isn't ideal.
>
> We have two main options to address this:
>
> Option1:
> When calling pg_get_publication_tables(), if the 'alltables' option is
> used, we can scan all subscriptions and explicitly ignore (filter out)
> all conflict history tables. This will not be very costly as this
> will scan the subscriber when pg_get_publication_tables() is called,
> which is only called during create subscription/alter subscription on
> the remote node.
>
> Option2:
> Alternatively, we could introduce a table creation option, like a
> 'non-publishable' flag, to prevent a table from being streamed
> entirely. I believe this would be a valuable, independent feature for
> users who want to create certain tables without including them in
> logical replication.
>
> I prefer option2, as I feel this can add value independent of this patch.
>

I agree that marking tables with a flag to easily exclude them during
publishing would be cleaner. In the current patch, for an ALL-TABLES
publication, we scan pg_subscription for each table in pg_class to
check its subconflicttable and decide whether to ignore it. But since
this only happens during create/alter subscription and refresh
publication, the overhead should be acceptable.

Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
enhancement but since we already have the EXCEPT list built in a
separate thread, that might be sufficient for now. IMO, such
conflict-tables should be marked internally (for example, with a
‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
identified within the system, without requiring users to explicitly
specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
see what others think on this.
For the time being, the current implementation looks fine, considering
it runs only during a few publication-related DDL operations.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-11-12 07:05:58 Re: DOCS: ALTER PUBLICATION - Synopsis for DROP is a bit misleading
Previous Message Corey Huinker 2025-11-12 06:47:33 Re: Extended Statistics set/restore/clear functions.