FATAL: terminating walreceiver process due to administrator command

From: "Maeldron T(dot)" <maeldron(at)gmail(dot)com>
To: Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: FATAL: terminating walreceiver process due to administrator command
Date: 2019-02-01 14:32:29
Message-ID: CAKatfSnQP4gwpGNPxT6Gg-HFL9T6yefYaiSGhx=j5mrgOGV1Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

Today, I received email notifications from a server telling me the
replication was lagging.

The application is monitoring the delay. In the past, it happened a few
times I received this notification during a high load but not from this
server.

However, the replication was not lagging. It stopped.

There was a single line in the log of the standby server:

FATAL: terminating walreceiver process due to administrator command

I did not find anything related in the master’s log. The rest of the log
was all about the slow statements.

As far I can tell you, this never happened before. The standby server was
running but the replication stopped. I restarted the server after 42
minutes of that line in the log. The replication caught up in 3-5 seconds.

Only I have access to the servers.

I did not stop the replication process. I don’t even know how to do it.

There is no cron task that would do such thing. Only one application access
the database. I wrote it hence I know it didn’t do it either.

I have been running the servers for years with more or less the same
configuration.

As far as I see, when I see the same line in earlier logs, the database was
shut down as well. This was the only lonely line like that.

Recent changes on the servers:

* On 11 January, I upgraded from 10.5 to 10.6_2

* A few days ago, set up a new server that replicates one table from the
same master. This is a huge table but it’s rarely written. The replication
works. It’s the same time I used logical replication. The server where the
replication stopped uses async stream replication.

* When I set up the logical replication, I increased the wal_sender_timeout

I found nothing related in the logs (/var/log/messages, /var/log/all.log,
dmesg). This slave is probably the least loaded server of the group.

FreeBSD xxx 11.2-RELEASE-p8 FreeBSD 11.2-RELEASE-p8 #0: Tue Jan 8 21:35:12
UTC 2019 root(at)amd64-builder(dot)daemonology(dot)net:/usr/obj/usr/src/sys/GENERIC
amd64

/boot/loader.conf:
# PostgreSQL
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256

Everything else is either FreeBSD default or unrelated.

There is a lot of free memory. I don’t mean usable but free. 3GB RAM was
not even touched since the last boot.

M.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Petr Fedorov 2019-02-01 14:32:43 'update returning *' returns 0 columns instead of empty row with 2 columns when (i) no rows updated and (ii) when applied to a partitioned table with sub-partition
Previous Message Thomas Munro 2019-02-01 14:25:28 Re: BUG #15548: Unaccent does not remove combining diacritical characters