Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Date: 2019-12-18 19:00:01
Message-ID: b2cc98b1-0ba0-1358-572f-8a1a01a3880e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

18.12.2019 19:03, Tom Lane wrote:
> Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
>> 18.12.2019 6:25, Kyotaro Horiguchi wrote:
>>> At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
>>>> It does look suspiciously like this wait is triggering on these
>>>> machines and somehow getting masked in most other test cases.
>> https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996
>> tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too.
> Ugh. I wondered if we could really get away with slowing down all
> ERROR_ACCESS_DENIED cases.
>
>> It seems that the cause of the issue is in fsync_fname_ext:
> I think you're blaming the messenger. We cannot suddenly impose a
> rule that people mustn't try to open() files that might be directories.
> Even if fsync_fname_ext() happens to be the only place that does that
> today (which I doubt) it's going to bite us on the rear regularly in
> the future, because there's no way to enforce it, and most developers
> aren't going to notice the problem in testing.
Regarding fsync_fname_ext(), I thought that it's OK to avoid performing
a call that is known to fail. And even if someone will try to open() a
directory, he will need to add the same check for EACCES, otherwise the
code will fail on Windows. In fact, I see the same pattern in
pre_sync_fname, fsync_fname in file_utils.c.
> It's possible that we could deal with this specific case by having
> pgwin32_open() try to stat the file and see if it's a directory. But
> I think that's messy. Also, the list of kernel-level conditions that
> that webpage says are mapped to ERROR_ACCESS_DENIED is scarily long:
>
> STATUS_THREAD_IS_TERMINATING ERROR_ACCESS_DENIED
> STATUS_PROCESS_IS_TERMINATING ERROR_ACCESS_DENIED
> STATUS_INVALID_LOCK_SEQUENCE ERROR_ACCESS_DENIED
> STATUS_INVALID_VIEW_SIZE ERROR_ACCESS_DENIED
> STATUS_ALREADY_COMMITTED ERROR_ACCESS_DENIED
> STATUS_ACCESS_DENIED ERROR_ACCESS_DENIED
> STATUS_FILE_IS_A_DIRECTORY ERROR_ACCESS_DENIED
> STATUS_CANNOT_DELETE ERROR_ACCESS_DENIED
> STATUS_FILE_DELETED ERROR_ACCESS_DENIED
> STATUS_FILE_RENAMED ERROR_ACCESS_DENIED
> STATUS_DELETE_PENDING ERROR_ACCESS_DENIED
> STATUS_PORT_CONNECTION_REFUSED ERROR_ACCESS_DENIED
> ...
> STATUS_ENCRYPTION_FAILED ERROR_ACCESS_DENIED
> STATUS_DECRYPTION_FAILED ERROR_ACCESS_DENIED
> STATUS_NO_RECOVERY_POLICY ERROR_ACCESS_DENIED
> STATUS_NO_EFS ERROR_ACCESS_DENIED
> STATUS_WRONG_EFS ERROR_ACCESS_DENIED
> STATUS_NO_USER_KEYS ERROR_ACCESS_DENIED
> ...
> SEC_E_MESSAGE_ALTERED ERROR_ACCESS_DENIED
> SEC_E_OUT_OF_SEQUENCE ERROR_ACCESS_DENIED
>
> Maybe most of these can't occur during fopen, but I'd
> rather not assume that
I see your point about future uncertainty, but I don't think that
receiving e.g. STATUS_FILE_DELETED or STATUS_NO_EFS error every few
seconds and ignoring it is a normal practice.
> Is there a way to get the original kernel error code?
> It'd be a lot better if we could be sure that the condition
> is STATUS_DELETE_PENDING before looping.
Unfortunately no, there is no known way  to get the kernel error code.
As stated in https://stackoverflow.com/questions/3764072, there is the
GetFileInformationByHandleEx
<http://msdn.microsoft.com/en-us/library/aa364953(v=VS.85).aspx>
function, that can return DeletePending state, but to use it we should
open the file first (to get a file handle).

So I see three ways now:
1) Revert the pgwin32_open change and choose the previously rejected
approach: Unlink postmaster.pid using rename operation (adopt the
solution from https://stackoverflow.com/questions/3764072/).
2) Add a check for directory in pgwin32_open, but I think then we're
getting close to checking there just for postmaster.pid.
3) Find out how often we get ERROR_ACCESS_DENIED during all regression
tests and if such error count is near zero, then add the check for isdir
in fsync_fname_ext and friends, and get done.

I'm still thinking that we shouldn't perform doomed calls, but I'm ready
to change the course.

Best regards,
Alexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2019-12-18 19:09:25 Re: How to prevent POSTGRES killing linux system from accepting too much inserts?
Previous Message Merlin Moncure 2019-12-18 18:34:36 Re: How to prevent POSTGRES killing linux system from accepting too much inserts?