| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication |
| Date: | 2025-11-25 17:16:39 |
| Message-ID: | CAHGQGwEfHW-SbgDorNZw_qwzNmt=k3_L8s9Tfc0SfHsU0KkMww@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Nov 24, 2025 at 6:03 PM Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> An update: I have two topics from the review perspective.
>
> On the test I proposed to be added to 030_pg_recvlogical.pl, I found a
> way to write it without using sleeps (which risk flakyness in CI). I've
> attached it as a patch for your consideration. I checked the test in the
> following way: on master it fails, but with your patch it passes.
Thanks for writing the test case and turning it into a patch. I agree that
we should add a regression test to ensure the reported issue doesn't recur.
It looks like the v1 patch you attached accidentally includes
the patch file itself. Could you remove that?
After applying the patch, git diff --check reported trailing whitespace.
Could you fix that?
+ '--fsync-interval', '1',
+ '--status-interval', '1',
Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
With a very small interval, the status feedback might happen before
the walsender is terminated and pg_recvlogical reconnects, which could
prevent the duplicate data from appearing even without the patch.
+use IPC::Run qw(start);
<snip>
+my $recv = start [
For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
just call "IPC::Run::start" directly?
+# Wait only for initial connection
+$node->poll_query_until('postgres',
+ "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
slot_name = 'reconnect_test'");
This is unlikely, but pg_recvlogical's connection could be terminated
immediately after connecting, before receiving any data. If that happens,
the test might behave unexpectedly. To make the test more robust,
should we instead poll on:
SELECT pg_read_file('$outfile') ~ 'INSERT'
instead, to ensure that some data has actually been received before
terminating the backend?
> Secondly I noticed in function sendFeedback at line 166, the startpos is
> set to output_written_lsn. This seems to counter conceptually the change
> you made in the patch, however it seems to not affect correctness. Shall
> we remove this line as to avoid confusion?
Isn't this necessary when - is specified for --file, causing OutputFsync() to
be skipped?
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jacob Champion | 2025-11-25 17:19:29 | Re: [oauth] SASL mechanisms |
| Previous Message | Andrew Dunstan | 2025-11-25 17:15:20 | Re: reduce size of logs stored by buildfarm |