Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root

From: Mikhail Kharitonov <mikhail(dot)kharitonov(dot)dev(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
Date: 2025-07-08 08:53:43
Message-ID: CAKkoVatYsZLBzmMFsNJZYTLRAP23Ys-Q4GVh8UNf1EjKHSXmDQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I’m sending v2 of the patch. This is a clean rebase onto current master
(commit a27893df45e) and a squash of the fix together with the TAP
test into a single patch file.

I would appreciate your thoughts and comments on the current problem.

Thank you!

--
Best regards,
Mikhail Kharitonov

On Thu, May 29, 2025 at 9:30 AM Mikhail Kharitonov
<mikhail(dot)kharitonov(dot)dev(at)gmail(dot)com> wrote:
>
> Hi,
>
> Thank you for the feedback.
>
> I would like to clarify that the current behavior does not break replication
> between PostgreSQL instances. The logical replication stream is still accepted
> by the subscriber, and the data is applied correctly. However, the protocol
> semantics are violated, which may cause issues for external systems that rely
> on interpreting this stream.
>
> When using publish_via_partition_root = true and setting REPLICA IDENTITY FULL
> only on the parent table (but not on all partitions), logical replication
> generates messages with the tag 'O' (old tuple) for updates and deletes even
> for partitions that do not have full identity configured.
>
> In those cases, only key columns are sent, and the rest of the tuple is omitted.
> This contradicts the meaning of tag 'O', which, according
> to the documentation [1], indicates that the full old tuple is included.
>
> This behavior is safe for the standard PostgreSQL subscriber, which does not
> rely on the tag when applying changes. However, third-party tools that consume
> the logical replication stream and follow the protocol strictly can be misled.
> For example, one of our clients uses a custom CDC mechanism that extracts
> changes and sends them to Oracle. Their handler interprets the 'O' tag as a
> signal that the full old row is available. When it is not - the data is
> processed incorrectly.
>
> The attached patch changes the behavior so that the 'O' or 'K' tag is chosen
> based on the REPLICA IDENTITY setting of the actual partition where the row
> ends up not only the parent.
> - If the partition has REPLICA IDENTITY FULL, the full tuple is
> sent and tagged 'O'.
> - Otherwise, only the key columns are sent, and the tag 'K' is used.
>
> This aligns the behavior with the protocol documentation.
> I have also included a TAP test: 036_partition_replica_identity.pl,
> located in src/test/subscription/t/
>
> It demonstrates two cases:
> - An update/delete on a partition with REPLICA IDENTITY FULL correctly
> emits an 'O' tag with the full old row.
> - An update/delete on a partition without REPLICA IDENTITY FULL currently
> also emits an 'O' tag, but only with key fields - this is the problem.
>
> After applying the patch, the second case correctly uses the 'K' tag.
>
> This patch is a minimal change it does not alter protocol structure
> or introduce new behavior. It only ensures the implementation matches
> the documentation. In the future, we might consider a broader redesign
> of logical replication for partitioned tables (see [2]), but this is
> a narrow fix that solves a real inconsistency.
>
> Looking forward to your comments.
>
> Best regards,
> Mikhail Kharitonov
>
> [1] https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html
> [2] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql
>
> On Mon, May 12, 2025 at 5:25 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
> >
> > Hi!
> >
> > This is probably not the most familiar part of Postgres to me, but does it break anything? Or is it just inconsistency in the replication protocol?
> >
> > A test for the described scenario would be a great addition. And, if it is feasible, provide an example of what would be broken with the way partitioned tables are replicated now.
> >
> > There is a chance that the replication protocol for partitioned tables needs to be rewritten, and I sincerely hope that I am wrong about this. It seems Alvaro Herrera tried this here [0].
> >
> >
> > [0] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql
> >
> >
> > --
> > Best regards,
> > Maxim Orlov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-07-08 08:59:29 Re: A recent message added to pg_upgade
Previous Message Dean Rasheed 2025-07-08 08:51:26 Re: Fix replica identity checks for MERGE command on published table.