Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Date: 2019-12-19 01:39:48
Message-ID: 20191219.103948.2182167132387989763.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At Wed, 18 Dec 2019 17:09:15 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> > 18.12.2019 19:03, Tom Lane wrote:
> >> Even if fsync_fname_ext() happens to be the only place that does that
> >> today (which I doubt) it's going to bite us on the rear regularly in
> >> the future, because there's no way to enforce it, and most developers
> >> aren't going to notice the problem in testing.
>
> > Regarding fsync_fname_ext(), I thought that it's OK to avoid performing
> > a call that is known to fail. And even if someone will try to open() a
> > directory, he will need to add the same check for EACCES, otherwise the
> > code will fail on Windows. In fact, I see the same pattern in
> > pre_sync_fname, fsync_fname in file_utils.c.
>
> My point is that it's totally unreasonable to expect callers to always
> know, in advance of accessing the filesystem, whether the pathname they
> are going to access is a directory or not. The point of calling a
> filesystem access function is to find out that kind of information.
> Yes, there are many cases where we can expect that PG can predict
> this, but an open() replacement has no business making such a sweeping
> assumption about its use-cases.
>
> >> Is there a way to get the original kernel error code?
> >> It'd be a lot better if we could be sure that the condition
> >> is STATUS_DELETE_PENDING before looping.
>
> > Unfortunately no, there is no known way  to get the kernel error code.
>
> Bleah. I did a little bit of googling and couldn't find anything
> to contradict that, but it's sure annoying.

The codes seems to be indentifiable only on device driver layer.

> > So I see three ways now:
> > 1) Revert the pgwin32_open change and choose the previously rejected
> > approach: Unlink postmaster.pid using rename operation (adopt the
> > solution from https://stackoverflow.com/questions/3764072/).
> > 2) Add a check for directory in pgwin32_open, but I think then we're
> > getting close to checking there just for postmaster.pid.
> > 3) Find out how often we get ERROR_ACCESS_DENIED during all regression
> > tests and if such error count is near zero, then add the check for isdir
> > in fsync_fname_ext and friends, and get done.
>
> I'm not terribly thrilled with (1) because I do not think that
> postmaster.pid accesses are the only place where we have this issue.
> And (3) is not only not a forward-looking solution, but it imagines
> that the speed of the regression tests is the only thing we need to
> worry about here. We generally don't assume that the regression
> tests are an accurate model of the performance users will see.
>
> So that leaves us with (2), it seems.

Agreed for (2).

I'm not sure about the point of the discussion that stat()'ing is
messy. If it's about performance, does stat()ing only when CreateFile
returned ERROR_ACCESS_DENIED helps? Anyway we wait for a moment in
that case.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2019-12-19 04:00:00 Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Previous Message Tom Lane 2019-12-18 22:09:15 Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)