Re: Fw: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fw: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.
Date: 2018-08-01 08:47:00
Message-ID: 20180801174700.e2638203ce5c5449163d6d2e@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 20 Jul 2018 10:48:15 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> writes:
> > Recently, one of our clients reported a problem that Windows 10 sometime
> > (approximately once in 300 tries) hung up at OS starting up while PostgreSQL
> > 9.3.x service is starting up. My co-worker analyzed this and found that
> > PostgreSQL's auxiliary process and Windows' logon processes are in a dead-lock
> > situation.
>
> Really? What would they deadlock on? Why is there any connection
> whatsoever? Why has nobody else run into this?

It is not clear where the hang occered, but this might be a problem
only on the specific version of Windows. Our client reported that
the hang occured with Windows 10 IoT Enterpise 2015 LTSB, but not
with Windows 10 IoT Enterpise 2016 LTSB or Windows 7.

>
> > He reported this problem to pgsql-general list as below. Also, he created a patch
> > to add a build-time option for adding 0.5 or 3.0 seconds delay after each sub
> > process starts.
>
> This seems like an ugly hack that probably doesn't reliably resolve
> whatever the problem is, but does manage to kill postmaster
> responsiveness :-(. It'd be especially awful to insert such a delay
> after forking parallel worker processes, which would be a problem in
> anything much newer than 9.3.

Agreed.

> I think we need more investigation; and to start with, reproducing
> the problem in a branch that's not within hailing distance of its EOL
> would be a good idea. (Not that I have reason to think PG's behavior
> has changed much here ... but 9.3 is just not a good basis for asking
> us to do anything now.)

They also reported that this problem occured with Windows 10 IoT Enterpise
2015 LTSB + PostgreSQL 10.3 as well as PostgreSQL 9.3.22. However,
reproducing this would be hard because we don't have Windows 10 IoT
enviromnemt and also the frequency is approximately once in 300 retries
of OS startup.

We will investigate this more and report if we found something.

Regards,

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-08-01 09:15:38 Re: Online enabling of checksums
Previous Message Michael Banck 2018-08-01 08:40:24 Re: Online enabling of checksums