"Operation on non-socket" analysis

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: "Operation on non-socket" analysis
Date: 2004-09-22 11:57:16
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE475D1B@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32

Hello!

I have now, with the help of Harald, analysed the problem with
"Operation attempted on something that is not a socket" error on win32
when some third-party LSPs are installed.

My initial thought was that it had to do something with our
blocking-over-non-blocking-emulation code that is tehre to handle signal
delivery, since this is a problem that does not happen to a lot of other
programs. This turned out to be incorrect.

The problem is related to the multi-process model used by postgresql,
where most win32 programs uses a multi-threaded model. It seems that at
least the LSP Harald has had problems with, and I bet most others, break
socket inheritance. We accept() the socket in the postmaster, then
CreateProcess() a new process and inherit the handle. This breaks on
these LSPs.

Per Microsofts own documentation, we should be able to do what we do
since we are NT only and not 9x (see for example
http://support.microsoft.com/default.aspx?scid=kb;en-us;150523 - "Under
Windows NT and Windows 2000, socket handles are inheritable by default.
This feature is often used by a process that wants to spawn a child
process and have the child process interact with the remote application
on the other end of the connection. "). This means that is is a bug in
the LSP.

That said, a workaround would be nice, since we are already receiving
several reports about this problem. I have tried using DuplicateHandle()
(which is strictly speaking incorrect, since the API does not let us kno
that a HANDLE and a SOCKET is actually the same thing, but still work at
ry), and it has the same behaviour. The only think I can think of
testing further is using WSADuplicateSocket(). This is significantly
more complex to implement (since it requires the pid of the child before
it can be executed, for one thing). I will see if/when I get a chance to
test this out in my test program - if somebody else beats me to writing
a test program for it, please do ;-)

Attached is the ugly little test program I wrote that shows this
behaviour. It works on my machiens, it shows the error on Haralds
machine. Start it in one console (needs to be console, not double-click,
or messages are lost). Then from another console, telnet to localhost on
port 999 and type anything at all. It should show error code 0. It shows
error code 10038 when it fails.

If WSADuplicateSocket() does not fix it, we should probably add the
check early in the installer to tell the user what the problem is
instead of erroring out the way we do now.

Does anybody have any further ideas on this subject?

//Magnus

<<sockt.c>>

Attachment Content-Type Size
sockt.c application/octet-stream 2.0 KB

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Gary Doades 2004-09-22 17:45:56 Re: OdbcCommand Parameter
Previous Message Luca Beretta 2004-09-22 07:31:05 OdbcCommand Parameter