Re: Logical replication timeout problem

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2021-11-11 17:44:51
Message-ID: CAA5-nLABf97QKAR8K8NiQs2s6_323dvd7kpAdJ3GZ+p2iR5K7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
Our lab is ready now. Amit, I compile Postgres 10.18 with your patch.Tang,
I used your script to configure logical replication between 2 databases and
to generate 10 million entries in an unreplicated foo table. On a
standalone instance no error message appears in log.
I activate the physical replication between 2 nodes, and I got following
error:

2021-11-10 10:49:12.297 CET [12126] LOG: attempt to send keep alive message
2021-11-10 10:49:12.297 CET [12126] STATEMENT: START_REPLICATION 0/3000000
TIMELINE 1
2021-11-10 10:49:15.127 CET [12064] FATAL: terminating logical replication
worker due to administrator command
2021-11-10 10:49:15.127 CET [12036] LOG: worker process: logical
replication worker for subscription 16413 (PID 12064) exited with exit code
1
2021-11-10 10:49:15.155 CET [12126] LOG: attempt to send keep alive message

This message look like strange because no admin command have been executed
during data load.
I did not find any error related to the timeout.
The message coming from the modification made with the patch comes back all
the time: attempt to send keep alive message. But there is no "sent keep
alive message".

Why logical replication worker exit when physical replication is configured?

Thanks for your help

Fabrice

On Fri, Oct 8, 2021 at 9:33 AM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
wrote:

> Thanks Tang for your script.
> Our debugging environment will be ready soon. I will test your script and
> we will try to reproduce the problem by integrating the patch provided by
> Amit. As soon as I have results I will let you know.
>
> Regards
>
> Fabrice
>
> On Thu, Sep 30, 2021 at 3:15 AM Tang, Haiying/唐 海英 <
> tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
>
>> On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis <
>> fabrice636861(at)gmail(dot)com> wrote:
>>
>> >
>>
>> > Thanks for your patch, we are going to set up a lab in order to debug
>> the function.
>>
>>
>>
>> Hi
>>
>>
>>
>> I tried to reproduce this timeout problem on version10.18 but failed.
>>
>> In my trial, I inserted large amounts of data at publisher, which took
>> more than 1 minute to replicate.
>>
>> And with the patch provided by Amit, I saw that the frequency of invoking
>>
>> WalSndKeepaliveIfNecessary function is raised after I inserted data.
>>
>>
>>
>> The test script is attached. Maybe you can try it on your machine and
>> check if this problem could happen.
>>
>> If I miss something in the script, please let me know.
>>
>> Of course, it will be better if you can provide your script to reproduce
>> the problem.
>>
>>
>>
>> Regards
>>
>> Tang
>>
>>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-11-11 17:46:38 Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
Previous Message Robert Haas 2021-11-11 17:32:59 Re: Should AT TIME ZONE be volatile?