Below are the detailed steps to follow to reproduce the issue on PG13: (Note: Since these steps are intended to be run manually, short delays like sleep 1 between steps are assumed and not explicitly mentioned. Any wait time longer than one second is explicitly called out.) -------------------- 1. Set up the primary and subscriber nodes with the same configurations as shared in reproduce_data_duplicate_without_twophase.sh. (The script can be used to do the initial setup) 2. On Primary: Create table tab1, insert a value and create a publication psql -d postgres -p $port_primary -c "CREATE TABLE tab1(a int); INSERT INTO tab1 VALUES(1); CREATE PUBLICATION pub FOR TABLE tab1;" 3. On Subscriber: Create the same table tab1 psql -d postgres -p $port_subscriber -c "CREATE TABLE tab1(a int);" 4. On Subscriber: Start the subscription with copy_data to false psql -d postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=$port_primary' PUBLICATION pub WITH (slot_name='logicalslot', create_slot=true, copy_data = false, enabled=true)" 5. Primary: Confirm the slot details psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 6. Insert the data into tab1. The apply worker's origin_lsn and slot's confirmed_flush will be advanced to this INSERT lsn (say lsn1) psql -d postgres -p $port_primary -c "INSERT INTO tab1 VALUES(2);" 7. Check both confirmed_flush and origin_lsn values, both values should now match the LSN of the insert above (lsn1). psql -d postgres -p $port_subscriber -c "select * from pg_replication_origin_status where local_id = 1;" psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 8. Add a new table (tab2) to the publication on Primary psql -d postgres -p $port_primary -c "CREATE TABLE tab2 (a int UNIQUE); ALTER PUBLICATION pub ADD TABLE tab2;" 9. Create tab2 on Subscriber psql -d postgres -p $port_subscriber -c "CREATE TABLE tab2 (a int UNIQUE);" 10. Refresh the subscription. It will start tablesync for tab2. psql -d postgres -p $port_subscriber -c "ALTER SUBSCRIPTION sub REFRESH PUBLICATION" 11. Attach debugger to the tablesync worker and hold it just before it sets the state to SUBREL_STATE_SYNCWAIT. 12. On Primary: Insert a row into tab2. Lets say the remote lsn for this change is lsn2. psql -d postgres -p $port_primary -c "INSERT INTO tab2 VALUES(2);" 13. Wait for 3+ seconds. The above insert will not be consumed by tablesync worker on sub yet. Apply worker will see this change and will ignore it. 14. Check that confirmed_flush has moved to lsn2 now (where lsn2 > lsn1 ) due to keepalive message handling in apply worker. And origin_lsn remains unchanged. psql -d postgres -p $port_subscriber -c "select * from pg_replication_origin_status where local_id = 1;" psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 15. Attach another debugger to apply-worker process and hold it just after maybe_reread_subscription() call but before process_syncing_tables(). 16. At tablesync debugger: release worker's hold, continue it. It will wait for apply worker to move the state to SUBREL_STATE_CATCHUP to proceed. 17. At apply worker debugger: release hold and let it finish process_syncing_tables(), it will move the state to SUBREL_STATE_CATCHUP. BUT hold it again just after process_syncing_tables() finishes. 18. Concurrently, before the tablesync catches up, attach the debugger to tablesync worker again, wait till process_syncing_tables_for_sync() call to hit and hold it just before it sets the state to SUBREL_STATE_SYNCDONE. Now, the tablesync will catchup (consume the data inserted above in tab2) and should wait in debugger just before setting the state to SUBREL_STATE_SYNCDONE. 19. Release the tablesync worker from debugger - It will now move to SUBREL_STATE_SYNCDONE state. Note: apply worker is on hold just after process_syncing_tables(), so will not move the state to READY yet. 20. Disable the sub psql -d postgres -p $port_subscriber -c "alter subscription sub disable;" 21. Release the apply worker from debugger. - It will exit due to sub being disabled. Tablesync will also exit here. 22. Enable the subscription again and let the apply worker start. - Apply-worker will move the state to SUBREL_STATE_READY. psql -d postgres -p $port_subscriber -c "alter subscription sub enable;" ----------- Wait here for 3+ seconds, now the state is: --table sync is finished on sub, changes are synced upton lsn2 --apply worker has processed and ignored the changes upto lsn2 without updating origin_lsn --apply worker's origin_lsn at sub is still lsn1 --confirmed_flush on pub is at lsn2 ----------- 23. Check the lsn values psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 24. Disable sub psql -d postgres -p $port_subscriber -c "alter subscription sub disable;" 25. Re-enable the sub and attach debugger to walsender(Primary) and hold it just before ProcessRepliesIfAny(). This will stop it from processing any more replies or send any more keepalive. psql -d postgres -p $port_subscriber -c "alter subscription sub enable;" -- Due to lack of any message from walsender, let apply worker send feedback with flush position as lsn1 (origin_lsn). But this will only be processed by walsender after we detach it from debugger. 26. Check origin is still at lsn1 and confirmed_flush at lsn2 psql -d postgres -p $port_subscriber -c "select * from pg_replication_origin_status where local_id = 1;" psql -d postgres -p $port_primary -c "SELECT slot_name, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 27. Disable the subscription psql -d postgres -p $port_subscriber -c "alter subscription sub disable;" 28. Detach the debugger from walsender process - Before exit, it will process the reply from apply worker and move the confirmed_flush to lsn1. 29. Check confirmed_flush is now moved to lsn1 psql -d postgres -p $port_primary -c "SELECT slot_name, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 30. Enable the subscription. - Now, the walsender will start streaming from lsn1, This will result in replay of 'INSERT to tab2 (lsn2)' and data duplication in tab2. psql -d postgres -p $port_subscriber -c "alter subscription sub enable;" -----------------------------------