Quick Links

Re: Logical replication timeout problem

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject:	Re: Logical replication timeout problem
Date:	2023-01-31 23:13:19
Message-ID:	CAHut+PvQPkATD5m=AqrUN9DvOrCk+U+anBfK4fPSSNT__8H0zg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Here are my review comments for v13-00001.

======
Commit message

1.
The DDLs like Refresh Materialized views that generate lots of temporary
data due to rewrite rules may not be processed by output plugins (for
example pgoutput). So, we won't send keep-alive messages for a long time
while processing such commands and that can lead the subscriber side to
timeout.

SUGGESTION (minor rearranged way to say the same thing)

For DDLs that generate lots of temporary data due to rewrite rules
(e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput)
may not be processed for a long time. Since we don't send keep-alive
messages while processing such commands that can lead the subscriber
side to timeout.

~~~

2.
The commit message says what the problem is, but it doesn’t seem to
describe what this patch does to fix the problem.

======
src/backend/replication/logical/reorderbuffer.c

3.
+ /*
+ * It is possible that the data is not sent to downstream for a
+ * long time either because the output plugin filtered it or there
+ * is a DDL that generates a lot of data that is not processed by
+ * the plugin. So, in such cases, the downstream can timeout. To
+ * avoid that we try to send a keepalive message if required.
+ * Trying to send a keepalive message after every change has some
+ * overhead, but testing showed there is no noticeable overhead if
+ * we do it after every ~100 changes.
+ */

3a.
"data is not sent to downstream" --> "data is not sent downstream" (?)

3b.
"So, in such cases," --> "In such cases,"

~~~

4.
+#define CHANGES_THRESHOLD 100
+
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change->lsn);
+ changes_count = 0;
+ }

I was wondering if it would have been simpler to write this code like below.

Also, by doing it this way the 'changes_count' variable name makes
more sense IMO, otherwise (for current code) maybe it should be called
something like 'changes_since_last_keepalive'

SUGGESTION
if (++changes_count % CHANGES_THRESHOLD == 0)
rb->update_progress_txn(rb, txn, change->lsn);

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Re: Logical replication timeout problem at 2023-01-31 11:27:49 from Amit Kapila

Responses

Re: Logical replication timeout problem at 2023-02-01 03:35:48 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2023-01-31 23:19:39	Re: Show various offset arrays for heap WAL records
Previous Message	Peter Geoghegan	2023-01-31 22:47:35	Re: Show various offset arrays for heap WAL records