RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2022-03-18 05:13:02
Message-ID: OS3PR01MB6275C67F14954E05CE5D04399E139@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
Thanks for your comments.

> On Thu, Mar 17, 2022 at 7:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 17, 2022 at 12:27 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 7:38 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > After more thought, can we check only wal_sender_timeout without
> > > > skip-count? That is, in WalSndUpdateProgress(), if we have
> > > > received any reply from the subscriber in last (wal_sender_timeout
> > > > / 2), we don't need to do anything in terms of keep-alive. If not,
> > > > we do
> > > > ProcessRepliesIfAny() (and probably WalSndCheckTimeOut()?) then
> > > > WalSndKeepalivesIfNecessary(). That way, we can send keep-alive
> > > > messages every (wal_sender_timeout / 2). And since we don't call
> > > > them for every change, we would not need to worry about the overhead
> much.
> > > >
> > >
> > > But won't that lead to a call to GetCurrentTimestamp() for each
> > > change we skip? IIUC from previous replies that lead to a slight
> > > slowdown in previous tests of Wang-San.
> > >
> > If the above is true then I think we can use a lower skip_count say 10
> > along with a timeout mechanism to send keepalive message. This will
> > help us to alleviate the overhead Wang-San has shown.
>
> Using both sounds reasonable to me. I'd like to see how much the overhead is
> alleviated by using skip_count 10 (or 100).
>
> > BTW, I think there could be one other advantage of using
> > ProcessRepliesIfAny() (as you are suggesting) is that it can help to
> > release sync waiters if there are any. I feel that would be the case
> > for the skip_empty_transactions patch [1] which uses
> > WalSndUpdateProgress to send keepalive messages after skipping empty
> > transactions.
>
> +1
I modified the patch according to your and Amit-San's suggestions.
In addition, after testing, I found that when the threshold is 10, it brings
slight overhead.
So I try to change it to 100, after testing, the results look good to me.
10 : 1.22%--UpdateProgress
100 : 0.16%--UpdateProgress

Please refer to attachment.

Attach the new patch.
1. Refactor the way to send keepalive messages.
[suggestion by Sawada-San, Amit-San.]
2. Modify the value of flag is_send initialization to make it look more
reasonable. [suggestion by Kuroda-San.]
3. Improve new function names.
(From SendKeepaliveIfNecessary to UpdateProgress.)

Regards,
Wang wei

Attachment Content-Type Size
v3-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch application/octet-stream 14.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2022-03-18 05:13:48 RE: Logical replication timeout problem
Previous Message Stephen Frost 2022-03-18 04:45:49 Re: Proposal: Support custom authentication methods using hooks