Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root

From: Mikhail Kharitonov <mikhail(dot)kharitonov(dot)dev(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
Date: 2025-05-29 06:30:31
Message-ID: CAKkoVau0NBdKApEOqnuN+vba6RP41Xs73Y=L7SALuOyhPQhHVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for the feedback.

I would like to clarify that the current behavior does not break replication
between PostgreSQL instances. The logical replication stream is still accepted
by the subscriber, and the data is applied correctly. However, the protocol
semantics are violated, which may cause issues for external systems that rely
on interpreting this stream.

When using publish_via_partition_root = true and setting REPLICA IDENTITY FULL
only on the parent table (but not on all partitions), logical replication
generates messages with the tag 'O' (old tuple) for updates and deletes even
for partitions that do not have full identity configured.

In those cases, only key columns are sent, and the rest of the tuple is omitted.
This contradicts the meaning of tag 'O', which, according
to the documentation [1], indicates that the full old tuple is included.

This behavior is safe for the standard PostgreSQL subscriber, which does not
rely on the tag when applying changes. However, third-party tools that consume
the logical replication stream and follow the protocol strictly can be misled.
For example, one of our clients uses a custom CDC mechanism that extracts
changes and sends them to Oracle. Their handler interprets the 'O' tag as a
signal that the full old row is available. When it is not - the data is
processed incorrectly.

The attached patch changes the behavior so that the 'O' or 'K' tag is chosen
based on the REPLICA IDENTITY setting of the actual partition where the row
ends up not only the parent.
- If the partition has REPLICA IDENTITY FULL, the full tuple is
sent and tagged 'O'.
- Otherwise, only the key columns are sent, and the tag 'K' is used.

This aligns the behavior with the protocol documentation.
I have also included a TAP test: 036_partition_replica_identity.pl,
located in src/test/subscription/t/

It demonstrates two cases:
- An update/delete on a partition with REPLICA IDENTITY FULL correctly
emits an 'O' tag with the full old row.
- An update/delete on a partition without REPLICA IDENTITY FULL currently
also emits an 'O' tag, but only with key fields - this is the problem.

After applying the patch, the second case correctly uses the 'K' tag.

This patch is a minimal change it does not alter protocol structure
or introduce new behavior. It only ensures the implementation matches
the documentation. In the future, we might consider a broader redesign
of logical replication for partitioned tables (see [2]), but this is
a narrow fix that solves a real inconsistency.

Looking forward to your comments.

Best regards,
Mikhail Kharitonov

[1] https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html
[2] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql

On Mon, May 12, 2025 at 5:25 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
>
> Hi!
>
> This is probably not the most familiar part of Postgres to me, but does it break anything? Or is it just inconsistency in the replication protocol?
>
> A test for the described scenario would be a great addition. And, if it is feasible, provide an example of what would be broken with the way partitioned tables are replicated now.
>
> There is a chance that the replication protocol for partitioned tables needs to be rewritten, and I sincerely hope that I am wrong about this. It seems Alvaro Herrera tried this here [0].
>
>
> [0] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql
>
>
> --
> Best regards,
> Maxim Orlov.

Attachment Content-Type Size
0001-Fix-replica-identity-flags-for-partitioned-tables (2).patch application/octet-stream 6.8 KB
036_partition_replica_identity.pl application/octet-stream 4.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-05-29 07:08:07 Re: Foreign key validation failure in 18beta1
Previous Message Ajin Cherian 2025-05-29 05:59:51 Re: Logical Replication of sequences