| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Improve conflict detection when replication origins are reused |
| Date: | 2026-05-14 03:05:06 |
| Message-ID: | CABdArM5k7wCxq+L2r9-mLOoPXe_iHgdeRMXc31oxvzSj=xPjFQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
While reviewing the issue reported at [1] and the proposed solutions
at [2], I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.
In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.
A simple example:
1. Create subscription sub1 with 'roident = 1'
2. Replicate some rows into table 't1'
3. Drop 'sub1'
4. Create another subscription 'sub2'
5. `sub2` reuses 'roident = 1'
6. New updates arrive for rows previously written by 'sub1'
At this point, conflict detection sees:
tuple_origin == current_origin
and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.
This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.
I explored two possible approaches to solve this:
Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
- When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
- So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
- A new subscription reusing the same 'roident' will always conflict
with origin '0'.
Pros:
- Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]
- No additional checks needed during conflict detection
Cons:
- Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
- Not crash-safe currently(patch):
- if the server crashes midway, some entries may still contain the
old origin ID
- after restart, reused origins can again lead to missed conflicts
- Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.
Approach 2. Store replication origin creation time
----------------------
- Add a creation timestamp for each replication origin
- During conflict check:
if tuple_origin != current_origin
-> existing behavior
if tuple_origin == current_origin
-> compare tuple commit timestamp with origin creation time
if tuple_commit_ts <= origin_creation_time
-> treat as an origin reuse case and raise conflict
Pros:
-------
- No additional processing during DROP SUBSCRIPTION
- Lightweight runtime check (just one timestamp comparison)
- Naturally crash-safe since origin creation is WAL-logged already
Cons:
- Requires a catalog schema change
- The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
- May require additional handling for upgraded origins
IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.
Attached:
- Patch for approach 1
- Patch for approach 2
- A TAP test reproducing the issue
Note: The patches are manually tested for the reported issue, but not
yet tested for performance or additional edge cases.
Feedback and suggestions are welcome.
[1] https://www.postgresql.org/message-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu%3Dog%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg%40mail.gmail.com
--
Thanks,
Nisha
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Approach-2-Detect-origin-id-reuse-in-origin-diffe.patch | application/x-patch | 10.5 KB |
| v1-0002-Add-TAP-test-to-detect-conflict-when-origin-id-is.patch | application/x-patch | 3.2 KB |
| v1-0001-Approach-1-Zero-commit_ts-origin-on-replication-o.patch | application/x-patch | 5.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-05-14 03:10:28 | Re: Vacuumlo improvements |
| Previous Message | Henson Choi | 2026-05-14 01:56:23 | Re: Remove invalid SS2/SS3 handling from EUC-KR routines |