pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
Date: 2025-09-04 16:16:14
Message-ID: CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When pg_recvlogical loses connection, it reconnects and restarts
replication unless
--no-loop option is used. I noticed that in this scenario, data that
has already been
flushed can be re-sent after restarting replication. This happens
because the replication
start position used when restarting replication is taken from the write position
in the last status update message, which may be older than the actual
position of
the last flushed data. As a result, some flushed data could exist newer than
the replication start position and be re-sent. Is this a bug?

To fix this issue, I'd like to propose the attached patch that fixes
this by ensuring
all written data is flushed to disk before restarting replication and by using
the last flushed position as the replication start point. This prevents already
flushed data from being re-sent.

Additionally, when the --no-loop option is used, I found that pg_recvlogical
could previously exit without flushing written data, risking data loss.
The attached patch fixes this issue by also ensuring that all data is flushed
to disk before exiting with --no-loop.

Thought?

Regards,

--
Fujii Masao

Attachment Content-Type Size
v1-0001-pg_recvlogical-Prevent-flushed-data-from-being-re.patch application/octet-stream 2.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2025-09-04 16:18:21 Re: Vacuum statistics
Previous Message Sami Imseih 2025-09-04 16:14:53 PgStat_HashKey padding issue when passed by reference