Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc: Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2022-01-13 13:59:02
Message-ID: CAA4eK1+4yXeUQ1E=5C8xHN5VpO=_+VKP-QnoKDLi0KWpEE8wSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 13, 2022 at 3:43 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
>
> first phase: postgres read WAL files and generate 1420 snap files.
> second phase: I guess, but on this point maybe you can clarify, postgres has to decode the snap files and remove them if no statement must be applied on a replicated table.
> It is from this point that the worker process exit after 1 minute timeout.
>

Okay, I think the problem could be that because we are skipping all
the changes of transaction there is no communication sent to the
subscriber and it eventually timed out. Actually, we try to send
keep-alive at transaction boundaries like when we call
pgoutput_commit_txn. The pgoutput_commit_txn will call
OutputPluginWrite->WalSndWriteData. I think to tackle the problem we
need to try to send such keepalives via WalSndUpdateProgress and
invoke that in pgoutput_change when we skip sending the change.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-01-13 13:59:58 SLRUs in the main buffer pool, redux
Previous Message Peter Eisentraut 2022-01-13 13:42:42 Re: Non-decimal integer literals