Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
Date: 2026-01-14 01:26:36
Message-ID: CAHGQGwF5SX=PA+xANQ0fme4XnZAFEB=XNerheesOX+0EqBU6wg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 12, 2026 at 4:08 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> Thanks for the patch. Here are my comments on v4.

Thanks for the review!

> 1 - 0001
> ```
> + /*
> + * Save the last flushed position as the replication start point. On
> + * reconnect, replication resumes from there to avoid re-sending flushed
> + * data.
> + */
> + startpos = output_fsync_lsn;
> ```
>
> Looking at function OutputFsync(), fsync() may fail and there a few branches to return early without fsync(), so should we only update startpos after fsync()?

Maybe not, but I might be missing something. Could you clarify what
concrete scenario would be problematic with the current code?

> 2 - 0001
> ```
> + if (outfd != -1 && strcmp(outfile, "-") != 0)
> + OutputFsync(feGetCurrentTimestamp());
> ```
>
> Do we need to check return code of OutputFsync()? I checked this file, the only caller that doesn’t check return code of OutputFsync() has a comment for why:
> ```
> /* no need to jump to error on failure here, we're finishing anyway */
> OutputFsync(t);
> ```
>
> I saw 0004 has changed OutputFsync to return nothing. So, it’s ok to not adding a comment here. But I just feel that, if we want to make the commit self-contained, it would be better to add a comment here, but that’s not a strong opinion.

Yeah, I think that making the patch "self-contained" in that sense isn't
really worth the extra effort.

> 3 - 0001
>
> No a direct issue of this patch. I noticed that, everywhere that calls OutputFsync(), it checks outfd != -1 && strcmp(outfile, "-") != 0. However, two places miss the check:

The 0001 patch updates pg_recvlogical to call OutputFsync() before
restarting replication. That call is guarded by strcmp(outfile, "-") != 0.
Your comment makes me reconsider this: when "--file -" is used,
OutputFsync() would be skipped, so startpos would not be updated before
restarting replication. That can lead to duplicate output on stdout,
which is clearly problematic. For this reason, I removed that check
in the latest 0001 patch.

In other places where we check strcmp(outfile, "-") != 0, such as:

if (outfd != -1 && output_reopen && strcmp(outfile, "-") != 0)
{
now = feGetCurrentTimestamp();
OutputFsync(now);
close(outfd);
outfd = -1;
}

on second thought, the check seems necessary for close(outfd) and
"outfd = -1", but not for calling OutputFsync(). If that understanding is
correct, it might make sense to adjust where the check is applied.
However, I think that should be handled in a separate patch.

> 4 - 0002
> ```
> + croak "timed out waiting for match: $regexp”;
> ```
>
> Is it more helpful to include filename in the error message?

OK, I've updated the message to include the filename.

> 5 - 0003
> ```
> +my ($stdout, $stderr);
> +my $recv = IPC::Run::start(
> + [(at)pg_recvlogical_cmd],
> + '>' => \$stdout,
> + '2>' => \$stderr);
> ```
>
> $stdout and $stderr are never used.

Yes, but I'm fine with keeping them as they are.

I've attached the updated version of the patches upthread.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2026-01-14 01:43:22 Re: Proposal - Enabling btree_gist by default
Previous Message Marcos Magueta 2026-01-14 01:23:03 Re: WIP - xmlvalidate implementation from TODO list