RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2022-02-15 05:18:20
Message-ID: OS3PR01MB6275FD0D80CA1C7A3C2855B49E349@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tues, Feb 08, 2022 at 17:18 PM Kuroda, Hayato <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> I applied your patch and confirmed that codes passed regression test.
> I put a short reviewing:
Thanks for your test and review.

> ```
> + static int skipped_changes_count = 0;
> + /*
> + * Conservatively, at least 150,000 changes can be skipped in 1s.
> + *
> + * Because we use half of wal_sender_timeout as the threshold, and
> the unit
> + * of wal_sender_timeout in process is ms, the final threshold is
> + * wal_sender_timeout * 75.
> + */
> + int skipped_changes_threshold = wal_sender_timeout * 75;
> ```
>
> I'm not sure but could you tell me the background of this calculation?
> Is this assumption reasonable?
According to our discussion, we need to send keepalive messages to subscriber
when skipping changes.
One approach is that **for each skipped change**, we try to send keepalive
message by calculating whether a timeout will occur based on the current time
and the last time the keepalive was sent. But this will brings slight overhead.
So I want to try another approach: after **constantly skipping some changes**,
we try to send keepalive message by calculating whether a timeout will occur
based on the current time and the last time the keepalive was sent.

IMO, we should send keepalive message after skipping a certain number of
changes constantly.
And I want to calculate the threshold dynamically by using a fixed value to
avoid adding too much code.
In addition, different users have different machine performance, and users can
modify wal_sender_timeout, so the threshold should be dynamically calculated
according to wal_sender_timeout.

Based on these, I have tested on machines with different configurations. I took
the test results on the machine with the lowest configuration.
[results]
The number of changes that can be skipped per second : 537087 (Average)
To be safe, I set the value to 150000.
(wal_sender_timeout / 2 / 1000 * 150000 = wal_sender_timeout * 75)

The spec of the test server to get the threshold is:
CPU information : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Memert information : 816188 kB

> ```
> @@ -654,20 +663,62 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
> {
> case REORDER_BUFFER_CHANGE_INSERT:
> if (!relentry->pubactions.pubinsert)
> + {
> + if (++skipped_changes_count >=
> skipped_changes_threshold)
> + {
> + OutputPluginUpdateProgress(ctx, true);
> +
> + /*
> + * After sending keepalive message,
> reset
> + * skipped_changes_count.
> + */
> + skipped_changes_count = 0;
> + }
> return;
> + }
> break;
> ```
>
> Is the if-statement needed? In the walsender case
> OutputPluginUpdateProgress() leads WalSndUpdateProgress(), and the
> function also has the threshold for ping-ing.
As mentioned above, we need to skip some changes continuously before
calculating whether it will time out.
If there is no if-statement here, every time a change is skipped, the timeout
will be checked. This brings extra overhead.

> ```
> static void
> -WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn,
> TransactionId xid)
> +WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn,
> +TransactionId xid, bool send_keep_alive)
> {
> - static TimestampTz sendTime = 0;
> + static TimestampTz trackTime = 0;
> TimestampTz now = GetCurrentTimestamp();
>
> + if (send_keep_alive)
> + {
> + /*
> + * If half of wal_sender_timeout has lapsed without send
> message standby,
> + * send a keep-alive message to the standby.
> + */
> + static TimestampTz sendTime = 0;
> + TimestampTz ping_time =
> TimestampTzPlusMilliseconds(sendTime,
> +
> wal_sender_timeout / 2);
> + if (now >= ping_time)
> + {
> + WalSndKeepalive(false);
> +
> + /* Try to flush pending output to the client */
> + if (pq_flush_if_writable() != 0)
> + WalSndShutdown();
> + sendTime = now;
> + }
> + }
> +
> ```
>
> * +1 about renaming to trackTime.
> * `/2` might be magic number. How about following? Renaming is very welcome:
>
> ```
> +#define WALSND_LOGICAL_PING_FACTOR 0.5
> + static TimestampTz sendTime = 0;
> + TimestampTz ping_time = TimestampTzPlusMilliseconds(sendTime,
> +
> +wal_sender_timeout * WALSND_LOGICAL_PING_FACTOR)
> ```
In the existing code, similar operations on wal_sender_timeout use the style of
(wal_sender_timeout / 2), e.g. function WalSndKeepaliveIfNecessary. So I think
it should be consistent in this patch.
But I think it is better to use magic number too. Maybe we could improve it in
a new thread.

> Could you add a commitfest entry for cfbot?
Thanks for the reminder, I will add it soon.

Regards,
Wang wei

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-02-15 05:19:00 RE: Optionally automatically disable logical replication subscriptions on error
Previous Message Masahiko Sawada 2022-02-15 05:12:52 Re: Design of pg_stat_subscription_workers vs pgstats