Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc: Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2022-01-12 10:54:21
Message-ID: CAA4eK1LdOCMWimk3CeN3xWMoLx6Er7qNuZ6DLmBvNv1YA5PGog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 11, 2022 at 8:13 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
wrote:

> Can you explain why you think this will help in solving your current
> problem?
>
> Indeed your are right this function won't help, we have to look elsewhere.
>
> It is still not clear to me why the problem happened? IIUC, after
> restoring 4096 changes from snap files, we send them to the subscriber, and
> then apply worker should apply those one by one. Now, is it taking one
> minute to restore 4096 changes due to which apply worker is timed out?
>
> Now I can easily reproduce the problem.
> In a first phase, snap files are generated and stored in pg_replslot. This
> process end when1420 files are present in pg_replslots (this is in relation
> with statements that must be replayed from WAL). In the pg_stat_replication
> view, the state field is set to *catchup*.
> In a 2nd phase, the snap files must be decoded. However after one minute
> (wal_receiver_timeout parameter set to 1 minute) the worker process stop
> with a timeout.
>
>
What exactly do you mean by the first and second phase in the above
description?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-01-12 11:24:25 Re: Stream replication test fails of cfbot/windows server 2019
Previous Message Michail Nikolaev 2022-01-12 10:51:24 Re: Stream replication test fails of cfbot/windows server 2019