Re: Postgres 10.1 fails to start: server did not start in time

From: Christoph Berg <myon(at)debian(dot)org>
To: Adam Brusselback <adambrusselback(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres 10.1 fails to start: server did not start in time
Date: 2017-11-11 20:53:16
Message-ID: 20171111205316.u56lkmkakdmcx6zm@msg.df7cb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Re: Adam Brusselback 2017-11-11 <CAMjNa7c5shJPR9sDz=VwNZz1LoTTi=dhO1CUCDJL=u8+WLL5pg(at)mail(dot)gmail(dot)com>
> Hey Christoph, I tried starting it with init (service postgresql
> start), and pg_ctlcluster.
>
> I modified the pg_ctl.conf and set the timeout higher so I could just
> get my cluster back up and running properly, so I can't give you the
> info on what systemctl status says at the moment.

It looks like it's interference from systemd here. The problem is
easily reproduced by putting '-t 0' into pg_ctl.conf:

● postgresql(at)10-main(dot)service - PostgreSQL Cluster 10-main
Loaded: loaded (/lib/systemd/system/postgresql(at)(dot)service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2017-11-11 21:10:56 CET; 9ms ago
Process: 17946 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast 10-main stop (code=exited, status=1/FAILURE)
Process: 18000 ExecStart=postgresql(at)10-main --skip-systemctl-redirect 10-main start (code=exited, status=1/FAILURE)
Main PID: 17878 (code=exited, status=0/SUCCESS)

Nov 11 21:10:55 lehmann systemd[1]: Starting PostgreSQL Cluster 10-main...
Nov 11 21:10:56 lehmann postgresql(at)10-main[18000]: Error: /usr/lib/postgresql/10/bin/pg_ctl /usr/lib/postgresql/10/bin/pg_ctl start -D /var/lib/postgresql/10/main -l /var/log/postgresql/postgresql-10-main.log -t 0 -s -o -c config_file="/etc/postgresql/10/main/postgresql.conf" exited with status 1:
Nov 11 21:10:56 lehmann postgresql(at)10-main[18000]: pg_ctl: server did not start in time
Nov 11 21:10:56 lehmann systemd[1]: postgresql(at)10-main(dot)service: Control process exited, code=exited status=1
Nov 11 21:10:56 lehmann systemd[1]: Failed to start PostgreSQL Cluster 10-main.
Nov 11 21:10:56 lehmann systemd[1]: postgresql(at)10-main(dot)service: Unit entered failed state.
Nov 11 21:10:56 lehmann systemd[1]: postgresql(at)10-main(dot)service: Failed with result 'exit-code'.

In other words, systemd will by default stop a service that fails to
start.

I'm investigating if it's a good idea to tell systemd to ignore the
exit code of pg_ctl(cluster). Possibly moving to Type=notify is the
best solution, but not all majors support that yet.

Will report back once I have a solution.

Christoph

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lele Gaifax 2017-11-12 07:52:32 Re: Difference between CAST(v AS t) and v::t
Previous Message Mark Morgan Lloyd 2017-11-11 17:29:27 Re: pg on Debian servers