Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

From: Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
Date: 2025-11-26 13:25:31
Message-ID: d616631a-6621-4221-9452-4dd832d013cd@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 25/11/2025 17:16, Fujii Masao wrote:
> Thanks for writing the test case and turning it into a patch. I agree that
> we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked
that it fails in master, but passes with your patch.
> It looks like the v1 patch you attached accidentally includes
> the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
> + '--fsync-interval', '1',
> + '--status-interval', '1',
>
> Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
> With a very small interval, the status feedback might happen before
> the walsender is terminated and pg_recvlogical reconnects, which could
> prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version
I sent earlier.

> +use IPC::Run qw(start);
> <snip>
> +my $recv = start [
>
> For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
> just call "IPC::Run::start" directly?
Indeed, done.

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
>
>
>
>> Secondly I noticed in function sendFeedback at line 166, the startpos is
>> set to output_written_lsn. This seems to counter conceptually the change
>> you made in the patch, however it seems to not affect correctness. Shall
>> we remove this line as to avoid confusion?
> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped? Regards,
Yes, for sure. Would really like to avoid introducing flake in CI due to
this test.

> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped?
Upon another look, indeed. When writing to a regular file (--file -)
that assignment is redundant but harmless. But like you said, when
writing to stdout, without that line, startpos would never be updated.

> Additionally, when the --no-loop option is used, I found that
> pg_recvlogical
> could previously exit without flushing written data, risking data loss.
> The attached patch fixes this issue by also ensuring that all data is
> flushed
> to disk before exiting with --no-loop.

Should we think of some kind of test also for this part?

--
Thanks,
Mircea Cadariu

Attachment Content-Type Size
v1-0002-Add-test-for-pg_recvlogical-reconnection.patch text/plain 2.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2025-11-26 13:28:06 Re: System views for versions reporting
Previous Message atma ram 2025-11-26 13:22:07 Question on PostgreSQL Table Partitioning – Performance of Queries That Do Not Use the Partition Key