terminating walsender process due to replication timeout

From: AYahorau(at)ibagroup(dot)eu
To: pgsql-general(at)postgresql(dot)org
Subject: terminating walsender process due to replication timeout
Date: 2019-05-13 11:36:06
Message-ID: OF85C33E30.171C1C23-ON432583F9.003F5B16-432583F9.003FBAD7@iba.by
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello PostgreSQL Community!

I faced an issue on my linux machine using Postgres 11.3 .
I have 2 nodes in db cluster: master and standby.
I tried to perform a plenty of long-running queries which lead to the
databases desynchronization:
terminating walsender process due to replication timeout

Here is the output in debug mode:
2019-05-13 13:21:33 FET 00000 DEBUG: sending replication keepalive
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed;
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed;
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 LOG: terminating walsender process due to
replication timeout

The issue is reproducible. I configure 2 nodes cluster, download
demo_small.zip from https://edu.postgrespro.ru/ and run the following
command:
psql -U user1 -f demo_small.sql db1
and I get the observed behaviour.

I know that I can increase wal_sender_timeout value to avoid this
behaviour (currently wal_sender_timeout is equal to 1 second.)
To be honest I don't want to increase wal_sender_timeout because I would
like to detect some network issues quickly.

After having googled I found that someone faced a similar issue
https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
which was fixed in PostgreSQL 9.4.16.

Is my issue the same as described here
https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
?
Is there any other chance to avoid it without increasing
wal_sender_timeout?

Thank you in advance.
Regards,
Andrei

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-05-13 13:56:32 Re: perl path issue
Previous Message Achilleas Mantzios 2019-05-13 10:02:40 Re: perl path issue