Quick Links

Re: Logical replication timeout problem

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc:	Tang, Haiying/唐海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication timeout problem
Date:	2022-01-13 13:59:02
Message-ID:	CAA4eK1+4yXeUQ1E=5C8xHN5VpO=_+VKP-QnoKDLi0KWpEE8wSA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jan 13, 2022 at 3:43 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
>
> first phase: postgres read WAL files and generate 1420 snap files.
> second phase: I guess, but on this point maybe you can clarify, postgres has to decode the snap files and remove them if no statement must be applied on a replicated table.
> It is from this point that the worker process exit after 1 minute timeout.
>

Okay, I think the problem could be that because we are skipping all
the changes of transaction there is no communication sent to the
subscriber and it eventually timed out. Actually, we try to send
keep-alive at transaction boundaries like when we call
pgoutput_commit_txn. The pgoutput_commit_txn will call
OutputPluginWrite->WalSndWriteData. I think to tackle the problem we
need to try to send such keepalives via WalSndUpdateProgress and
invoke that in pgoutput_change when we skip sending the change.

--
With Regards,
Amit Kapila.

In response to

Re: Logical replication timeout problem at 2022-01-13 10:13:02 from Fabrice Chapuis

Responses

Re: Logical replication timeout problem at 2022-01-14 10:17:07 from Fabrice Chapuis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2022-01-13 13:59:58	SLRUs in the main buffer pool, redux
Previous Message	Peter Eisentraut	2022-01-13 13:42:42	Re: Non-decimal integer literals