| From: | Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication |
| Date: | 2025-11-26 13:25:31 |
| Message-ID: | d616631a-6621-4221-9452-4dd832d013cd@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 25/11/2025 17:16, Fujii Masao wrote:
> Thanks for writing the test case and turning it into a patch. I agree that
> we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked
that it fails in master, but passes with your patch.
> It looks like the v1 patch you attached accidentally includes
> the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
> + '--fsync-interval', '1',
> + '--status-interval', '1',
>
> Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
> With a very small interval, the status feedback might happen before
> the walsender is terminated and pg_recvlogical reconnects, which could
> prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version
I sent earlier.
> +use IPC::Run qw(start);
> <snip>
> +my $recv = start [
>
> For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
> just call "IPC::Run::start" directly?
Indeed, done.
> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
>
>
>
>> Secondly I noticed in function sendFeedback at line 166, the startpos is
>> set to output_written_lsn. This seems to counter conceptually the change
>> you made in the patch, however it seems to not affect correctness. Shall
>> we remove this line as to avoid confusion?
> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped? Regards,
Yes, for sure. Would really like to avoid introducing flake in CI due to
this test.
> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped?
Upon another look, indeed. When writing to a regular file (--file -)
that assignment is redundant but harmless. But like you said, when
writing to stdout, without that line, startpos would never be updated.
> Additionally, when the --no-loop option is used, I found that
> pg_recvlogical
> could previously exit without flushing written data, risking data loss.
> The attached patch fixes this issue by also ensuring that all data is
> flushed
> to disk before exiting with --no-loop.
Should we think of some kind of test also for this part?
--
Thanks,
Mircea Cadariu
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0002-Add-test-for-pg_recvlogical-reconnection.patch | text/plain | 2.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2025-11-26 13:28:06 | Re: System views for versions reporting |
| Previous Message | atma ram | 2025-11-26 13:22:07 | Question on PostgreSQL Table Partitioning – Performance of Queries That Do Not Use the Partition Key |