Re: Logical replication failed with SSL SYSCALL error

From: vignesh C <vignesh21(at)gmail(dot)com>
To: shaurya jain <12345shaurya(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication failed with SSL SYSCALL error
Date: 2023-04-20 06:19:13
Message-ID: CALDaNm3Yabfvm1=Wef1u8cHO517uRdzMr3eKD9SJShQvpftsJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shaurya(at)gmail(dot)com> wrote:
>
> Hi Team,
>
> Could you please help me with this, It's urgent for the production environment.
>
> On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shaurya(at)gmail(dot)com> wrote:
>>
>> Hi Team,
>>
>> Could you please help, It's urgent for the production env?
>>
>> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shaurya(at)gmail(dot)com> wrote:
>>>
>>> Hi Team,
>>>
>>> Postgres Version:- 13.8
>>> Issue:- Logical replication failing with SSL SYSCALL error
>>> Priority:-High
>>>
>>> We are migrating our database through logical replications, and all of sudden below error pops up in the source and target logs which leads us to nowhere.
>>>
>>> Logs from Source:-
>>> LOG: could not send data to client: Connection reset by peer
>>> STATEMENT: COPY public.test TO STDOUT
>>> FATAL: connection to client lost
>>> STATEMENT: COPY public.test TO STDOUT
>>>
>>> Logs from Target:-
>>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
>>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1250) exited with exit code 1
>>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table synchronization worker for subscription " sub_tables_2_180", table "test" has started
>>> 2023-04-15 19:12:05 UTC:10.144.19.34(33276):postgres(at)webadmit_staging:[7112]:WARNING: there is no transaction in progress
>>> 2023-04-15 19:14:08 UTC:10.144.19.34(33324):postgres(at)webadmit_staging:[6052]:LOG: could not receive data from client: Connection reset by peer
>>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2556) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2112) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1089) exited with exit code 1
>>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for subscription "sub_tables_2_180" has started
>>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for subscription "sub_tables_3_192" has started
>>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for subscription "sub_tables_1_180" has started
>>>
>>> Just after this error, all other replication slots get disabled for some time and come back online along with COPY command with the new PID in pg_stat_activity.
>>>
>>> I have a few queries regarding this:-
>>>
>>> The exact reason for disconnection (Few articles claim memory and few network)
This might be because of network failure, did you notice any network
instability, could you check the TCP settings.
You could check the following configurations tcp_keepalives_idle,
tcp_keepalives_interval and tcp_keepalives_count.
This means it will connect the server based on tcp_keepalives_idle
seconds specified , if the server does not respond in
tcp_keepalives_interval seconds it'll try again, and will consider the
connection gone after tcp_keepalives_count failures.

>>> Will it lead to data inconsistency?
It will not lead to inconsistency. In case of failure the failed
transaction will be rolled back.

>>> Does this new PID COPY command again migrate the whole data of the test table once again?
Yes, it will migrate the whole table data again in case of failures.

Regards,
Vignesh

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2023-04-20 07:11:31 Re: What happened to the tip "It is good practice to create a role that has the CREATEDB and CREATEROLE privileges..."
Previous Message Tom Lane 2023-04-20 04:03:34 Re: Question about accessing partitions whose name includes the schema name and a period - is this correct?

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-04-20 06:45:52 Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
Previous Message Hayato Kuroda (Fujitsu) 2023-04-20 05:31:16 RE: [PoC] pg_upgrade: allow to upgrade publisher node