Re: pg_publication_tables: return NULL attnames when no column list is specified

From: Roberto Mello <roberto(dot)mello(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_publication_tables: return NULL attnames when no column list is specified
Date: 2026-03-31 21:54:51
Message-ID: CAKz==b+30jUre6sZE85=afnn66FaQ926=4M1YobfsKoCJ=Mb+g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2026 at 2:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Tue, Mar 31, 2026 at 12:02 AM Roberto Mello <roberto(dot)mello(at)gmail(dot)com>
> wrote:
> >
> > The subscriber receives WAL from both publications for the same table.
> > Which column set should it apply? It cannot apply both as they disagree
> on
> > whether email is included. This is exactly the situation the "cannot use
> > different column lists" check was designed to prevent.
> >
>
> I think we need to consider the cases where current permissive
> behavior helps. For example, consider the cases where the schema is
>

I'm sorry but you completely dismissed the points that I brought up,
addressing
none of the concerns, and went on a tangent about "permissive behavior".

Permissive behavior is fine and dandy until it bites you in the @$$, and
there's
a line where permissive is wrong, bad, or dangerous. I pointed out why I
think
the current behavior is wrong, potentially dangerous. MySQL is king of
permissive
behavior but I don't think that's much to brag about. PostgreSQL
historically
favored correctness over permissiveness.

The deeper issue is that the "permissive" behavior isn't actually
permissive: it's
silently inconsistent. The two publications have different replication
contracts stored
in the catalog (prattrs = NULL vs prattrs = {1,2}), and pgoutput.c honors
that difference
at WAL decode time. The old code just hid the difference from the check
that was
supposed to detect it.

static. Now let us consider another case where a user would actually
> need to define such publication combinations for a subscription. One
> of the more common ways this conflict happens is accidental: User has
> pub_1 for TABLE t (col1, col2). User later decides to replicate the
> entire database to a new subscription and creates pub_2 FOR ALL
> TABLES. User adds pub_2 to the subscription. Currently, the user can
> add pub_2 and then later realize they need to drop pub_1 to clean
>

How would they "realize" that if the view is showing them the exact
same data for the two publications?

> things up. If ALTER SUBSCRIPTION blocks this, the user is stuck in a
> Catch-22 where they can't add the "All Tables" publication because a
> single specific table has a column list. They would have to drop the
> specific publication first, potentially losing replication coverage
> for that table during the gap.
>

I don't think it actually traps the user. The sequence that works today and
would continue to work:

-- Starting from sub has pub_1 with explicit column list
ALTER PUBLICATION pub_1 DROP TABLE t; -- on publisher
ALTER SUBSCRIPTION sub REFRESH PUBLICATION; -- on subscriber
-- Now add pub_2 FOR ALL TABLES
ALTER SUBSCRIPTION sub ADD PUBLICATION pub_2;

Or even simpler, just drop the column list from pub_1 before adding pub_2:

ALTER PUBLICATION pub_1 SET TABLE t; -- removes the column
list
ALTER SUBSCRIPTION sub ADD PUBLICATION pub_2;

There's no gap in replication coverage in either case because the
subscription is still
active throughout. The table remains subscribed via pub_1 until pub_2 takes
over.

For the static schema case: if the schema never changes, the two
publications do produce
identical column sets and there's no practical problem. But the publication
system doesn't
know the schema is static. It has to handle the general case. And the
general case is that
NULL means "all current and future columns" while an explicit list means
"exactly these,
nothing more." Those are different things (contracts) and being
"permissive" with interpretation
of what those contracts mean in my head versus someone else's could be
dangerous.

> Now, considering the other cases where replication later ERRORs out
> (like the one you mentioned) when we allow such combinations, we can
> give a WARNING at the time subscriber DDLs when they lead to such
> combinations.
>

A WARNING at ALTER SUBSCRIPTION time would be better than silence, but it
still
allows the subscriber into a state where a future ALTER TABLE ADD COLUMN on
the
publisher will cause replication to break. At that point the user gets the
ERROR anyway,
but now it's during replication rather than at setup time, which is harder
to diagnose and
recover from. Erroring early, when the user is actively making the change
and can fix it,
is the safer default.

That said, if the consensus is that the migration path is too disruptive, a
middle ground
could be to ERROR by default but provide a subscription-level option to
downgrade it to
WARNING. For the reasons I pointed out, I think the right call is to keep
the ERROR as
the default though, since it catches a real inconsistency.

Roberto Mello
Snowflake

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-31 22:05:46 Re: Buffer locking is special (hints, checksums, AIO writes)
Previous Message Jeff Davis 2026-03-31 21:34:28 Re: Do we still need MULE_INTERNAL?