Quick Links

BUG #16174: PostgreSQL 9.6 service can not start after unexpected shutdown

From:	PG Bug reporting form <noreply(at)postgresql(dot)org>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc:	pg(dot)dba(dot)iit(dot)team(at)gmail(dot)com
Subject:	BUG #16174: PostgreSQL 9.6 service can not start after unexpected shutdown
Date:	2019-12-20 12:37:20
Message-ID:	16174-8bd292aa647c8f3c@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16174
Logged by: Enrico La Torre
Email address: pg(dot)dba(dot)iit(dot)team(at)gmail(dot)com
PostgreSQL version: 9.6.12
Operating system: Windows Server 2016 Standard 10.0.14393
Description:

Hi,

in my company we run several thousands of PostgreSQL clusters currently with
version 9.6.12 (one cluster per server). We run PostgreSQL on a virtual
machine in Hyper-V Manager.
A minor upgrade to the latest minor version is planned but I did not find
any fix for this incident in any minor upgrade from 9.6.12 to 9.6.16.

The same incident has been reported in
https://www.postgresql.org/message-id/CAJtpbKR2NZKmoxveMSdJh%3Dzng1%2B6NwqK3EEbwQs0ac6oHE%3D4eg%40mail.gmail.com
for PostgresSQL 11 but it has not been responded to yet. However, we
experience the same incident in with PostgreSQL 9.6.

We can only correlate this incident to a reboot after an unexpected shutdown
of Windows. During restart it can(!) happen that the PostgreSQL service can
not be started by the
Service Control Manager because the PostgreSQL cluster is already running in
the background. A pg_ctl status will return a running database server. This
behavior must be related
to a race condition, because the incident does not occur every time or can
not be reproduced easily.

The unexpected shutdowns originate from either power outages or more likely
due to restarts of the server that hosts the Hyper-V Manager. If this server
gets patched and nees to restart it basically kills all running VMs in that
process, if they don't shutdown fast enough. Of course you can argue that we
must get rid of these unexpected shutdowns. But isn't the boot
process always the same after a shutdown nor matter if it was shutdown
cleanly or not? Why is the PostgreSQL cluster started in the background and
not by the Service Control Manager?
Is there a bug in the interplay between pg_ctl and the Windows Service
Control Manager?

At the bottom I attached an excerpt of the relevant log files.

Kind regards

PostgreSQL version number you are running:
9.6.12

How you installed PostgreSQL:
We initally started with PostgreSQL 9.6.6 and the EnterpriseDB installer
postgresql-9.6.6-2-windows-x64.exe
The minor upgrade to 9.6.12 was an installation with EnterpriseDB installer
postgresql-9.6.12-1-windows-x64.exe and then simply a distribution of all
9.6.12 binary files to all other servers.

Changes made to the settings in the postgresql.conf file:

Operating system and version:
Windows Server 2016 Standard 10.0.14393

What program you're using to connect to PostgreSQL:
psql, pgadmin4, ODBC driver PostgreSQL Unicode(x64) 9.06.04.00

Is there anything relevant or unusual in the PostgreSQL server logs?:

in pg_log

2019-12-19 03:35:06 CST [4756]: [1] LOG: database system was interrupted;
last known up at 2019-12-19 03:21:56 CST
2019-12-19 03:35:06 CST [3924]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62037
2019-12-19 03:35:06 CST [3788]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62042
2019-12-19 03:35:06 CST [5844]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62044
2019-12-19 03:35:07 CST [3780]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62043
2019-12-19 03:35:07 CST [3924]: [2]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: could not accept SSL connection: An established connection was aborted
by the software in your host machine.
2019-12-19 03:35:07 CST [3788]: [2]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: could not accept SSL connection: An established connection was aborted
by the software in your host machine.
2019-12-19 03:35:07 CST [3780]: [2]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: could not accept SSL connection: An established connection was aborted
by the software in your host machine.
2019-12-19 03:35:08 CST [5844]: [2]
user=SVCACC_446_PSQL,db=postgres,app=[unknown],client=446-091STL01.aldi-499.loc
FATAL: the database system is starting up
2019-12-19 03:35:09 CST [4144]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62046
2019-12-19 03:35:09 CST [4144]: [2]
user=SVCACC_446_PSQL,db=postgres,app=[unknown],client=446-091STL01.aldi-499.loc
FATAL: the database system is starting up
2019-12-19 03:35:10 CST [4548]: [1]
user=[unknown],db=[unknown],app=[unknown],client=446-091STL01.aldi-499.loc
LOG: connection received: host=446-091STL01.aldi-499.loc port=62048
[...] These messages continue until

2019-12-19 03:36:05 CST [5236]: [2]
user=SVCACC_446_PSQL,db=postgres,app=[unknown],client=446-091STL01.aldi-499.loc
FATAL: the database system is starting up
2019-12-19 03:37:29 CST [4756]: [2] LOG: database system was not properly
shut down; automatic recovery in progress
2019-12-19 03:37:29 CST [4756]: [3] LOG: redo starts at 26/23141D00
2019-12-19 03:37:29 CST [4756]: [4] LOG: invalid record length at
26/23151158: wanted 24, got 0
2019-12-19 03:37:29 CST [4756]: [5] LOG: redo done at 26/23151120
2019-12-19 03:37:30 CST [4756]: [6] LOG: last completed transaction was at
log time 2019-12-19 03:23:00.369744-06
2019-12-19 03:37:30 CST [4756]: [7] LOG: checkpoint starting:
end-of-recovery immediate
2019-12-19 03:37:30 CST [4756]: [8] LOG: checkpoint complete: wrote 10
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=0.006 s, sync=0.414 s, total=0.657 s; sync files=5, longest=0.098 s,
average=0.082 s; distance=61 kB, estimate=61 kB
2019-12-19 03:37:30 CST [4756]: [9] LOG: MultiXact member wraparound
protections are now enabled
2019-12-19 03:37:30 CST [5380]: [3] LOG: database system is ready to accept
connections

in Event Viewer / Application

Level Date and Time Source Event ID Task Category
Information 12/19/2019 3:34:23 AM PostgreSQL 0 None Waiting for server
startup...
Information 12/19/2019 3:35:05 AM PostgreSQL 0 None "2019-12-19 03:35:05 CST
[5380]: [1] LOG: redirecting log output to logging collector
process2019-12-19 03:35:05 CST [5380]: [2] HINT: Future log output will
appear in directory ""pg_log""."
Error 12/19/2019 3:36:06 AM PostgreSQL 0 None Timed out waiting for server
startup
Information 12/19/2019 3:39:29 AM PostgreSQL 0 None Waiting for server
startup...
Error 12/19/2019 3:39:29 AM PostgreSQL 0 None "2019-12-19 03:39:29 CST
[5216]: [1] FATAL: lock file ""postmaster.pid"" already exists2019-12-19
03:39:29 CST [5216]: [2] HINT: Is another postmaster (PID 5380) running in
data directory ""G:/databases/PostgreSQL/9.6""?"
Error 12/19/2019 3:39:30 AM PostgreSQL 0 None Timed out waiting for server
startup

in Event Viewer / System

Level Date and Time Source Event ID Task Category
Information 12/19/2019 3:31:58
AM Microsoft-Windows-Kernel-General 12 (1) The operating system started at
system time ‎2019‎-‎12‎-‎19T09:31:58.494227200Z.
Information 12/19/2019 3:31:58 AM Microsoft-Windows-Kernel-Boot 20 (31) The
last shutdown's success status was false. The last boot's success status was
true.
Critical 12/19/2019 3:32:08 AM Microsoft-Windows-Kernel-Power 41 (63) The
system has rebooted without cleanly shutting down first. This error could be
caused if the system stopped responding, crashed, or lost power
unexpectedly.
Error 12/19/2019 3:39:30 AM Service Control Manager 7001 None "The
xxx-Server service depends on the postgresql - PostgreSQL Server 9.6 service
which failed to start because of the following error: The operation
completed successfully."
Information 12/19/2019 3:39:30 AM Service Control Manager 7036 None The
postgresql - PostgreSQL Server 9.6 service entered the stopped state.

What you were doing when the error happened / how to cause the error:
with PowerShell:

# stop the cluster
pg_ctl stop -D G:\databases\PostgreSQL\9.6 -m fast
# start the cluster via Service Control Manager
Start-Service -Name postgresql

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	PG Bug reporting form	2019-12-20 13:13:25	BUG #16175: To migrate MySQL To PostgreSQL
Previous Message	PG Bug reporting form	2019-12-20 10:41:32	BUG #16173: PostgreSQL 9.6 service can not start after unexpected shutdown