| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication |
| Date: | 2026-01-14 01:26:36 |
| Message-ID: | CAHGQGwF5SX=PA+xANQ0fme4XnZAFEB=XNerheesOX+0EqBU6wg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jan 12, 2026 at 4:08 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> Thanks for the patch. Here are my comments on v4.
Thanks for the review!
> 1 - 0001
> ```
> + /*
> + * Save the last flushed position as the replication start point. On
> + * reconnect, replication resumes from there to avoid re-sending flushed
> + * data.
> + */
> + startpos = output_fsync_lsn;
> ```
>
> Looking at function OutputFsync(), fsync() may fail and there a few branches to return early without fsync(), so should we only update startpos after fsync()?
Maybe not, but I might be missing something. Could you clarify what
concrete scenario would be problematic with the current code?
> 2 - 0001
> ```
> + if (outfd != -1 && strcmp(outfile, "-") != 0)
> + OutputFsync(feGetCurrentTimestamp());
> ```
>
> Do we need to check return code of OutputFsync()? I checked this file, the only caller that doesn’t check return code of OutputFsync() has a comment for why:
> ```
> /* no need to jump to error on failure here, we're finishing anyway */
> OutputFsync(t);
> ```
>
> I saw 0004 has changed OutputFsync to return nothing. So, it’s ok to not adding a comment here. But I just feel that, if we want to make the commit self-contained, it would be better to add a comment here, but that’s not a strong opinion.
Yeah, I think that making the patch "self-contained" in that sense isn't
really worth the extra effort.
> 3 - 0001
>
> No a direct issue of this patch. I noticed that, everywhere that calls OutputFsync(), it checks outfd != -1 && strcmp(outfile, "-") != 0. However, two places miss the check:
The 0001 patch updates pg_recvlogical to call OutputFsync() before
restarting replication. That call is guarded by strcmp(outfile, "-") != 0.
Your comment makes me reconsider this: when "--file -" is used,
OutputFsync() would be skipped, so startpos would not be updated before
restarting replication. That can lead to duplicate output on stdout,
which is clearly problematic. For this reason, I removed that check
in the latest 0001 patch.
In other places where we check strcmp(outfile, "-") != 0, such as:
if (outfd != -1 && output_reopen && strcmp(outfile, "-") != 0)
{
now = feGetCurrentTimestamp();
OutputFsync(now);
close(outfd);
outfd = -1;
}
on second thought, the check seems necessary for close(outfd) and
"outfd = -1", but not for calling OutputFsync(). If that understanding is
correct, it might make sense to adjust where the check is applied.
However, I think that should be handled in a separate patch.
> 4 - 0002
> ```
> + croak "timed out waiting for match: $regexp”;
> ```
>
> Is it more helpful to include filename in the error message?
OK, I've updated the message to include the filename.
> 5 - 0003
> ```
> +my ($stdout, $stderr);
> +my $recv = IPC::Run::start(
> + [(at)pg_recvlogical_cmd],
> + '>' => \$stdout,
> + '2>' => \$stderr);
> ```
>
> $stdout and $stderr are never used.
Yes, but I'm fine with keeping them as they are.
I've attached the updated version of the patches upthread.
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andreas Karlsson | 2026-01-14 01:43:22 | Re: Proposal - Enabling btree_gist by default |
| Previous Message | Marcos Magueta | 2026-01-14 01:23:03 | Re: WIP - xmlvalidate implementation from TODO list |