Re: pg_listener entries deleted under heavy NOTIFY load only on Windows

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Radu Ilie <rilie(at)wsi(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Date: 2010-01-31 17:18:43
Message-ID: 9837222c1001310918l1953971dh230d2a6c6910649@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 22, 2010 at 22:30, Radu Ilie <rilie(at)wsi(dot)com> wrote:
> On a Windows server under heavy load of NOTIFY events, entries in
> pg_listener table for some events are deleted. It is like UNLISTEN was
> called.
>
> PostgreSQL version: 8.3.9.
> Operating System: Windows XP.
>
> PostgreSQL believes that if it fails to notify a listener (by signaling the
> respective backend), then the backend doesn't exist anymore and so it should
> get rid of the pg_listener entry. The relevant code is in
> src/backend/commands/async.c, function Send_Notify:
>
> if (kill(listenerPID, SIGUSR2) < 0)
> {
> /*
> * Get rid of pg_listener entry if it refers to a PID that no
> * longer exists.  Presumably, that backend crashed without
> * deleting its pg_listener entries. This code used to only
> * delete the entry if errno==ESRCH, but as far as I can see
> * we should just do it for any failure (certainly at least
> * for EPERM too...)
> */
> simple_heap_delete(lRel, &lTuple->t_self);
> }
>
> The problem is that under Windows, kill can fail even if the process is
> still alive. PostgreSQL uses named pipes under Windows to send signals to
> backends. The present implementation has a bug that causes a client to fail
> to write data to the named pipe, even though the server process is alive.
> This is because the server doesn't maintain the named pipe at all times. The
> bug is present in file src/backend/port/win32/signal.c, function
> pg_signal_thread.
>
> The server code stays in a loop in which it continuously creates an instance
> of the named pipe (via CreateNamedPipe) and waits for a client process to
> connect (via ConnectNamedPipe). Once a client connects, the communication
> with the client is handled in a new thread, with the thread procedure
> pg_signal_dispatch_thread. This function is very simple: it reads one byte
> from the named pipe, it writes it back and (very important) closes the
> handle to the named pipe instance. The main loop creates another instance of
> the named pipe and waits for another client to connect.
>
> Now imagine that the server is under heavy load. There are dozens of
> backends and threads running and the CPU usage is close to 100%. The
> following succession of events is possible:
>
> 1. Server signal handling thread (in function pg_signal_thread) creates the
> first, one and only instance of the named pipe via CreateNamedPipe.
> 2. Server code starts waiting for clients to connect with ConnectNamedPipe.
> 3. Client wishes to make a transaction on the named pipe and calls
> CallNamedPipe (in file src/port/kill.c, function pgkill).
> 4. Server code returns from ConnectNamedPipe. It creates a new thread with
> the thread procedure pg_signal_dispatch_thread.
> 5. The signal dispatch thread is scheduled for execution and it runs to
> completion. As you can see, the last thing it does related to the named pipe
> is to close the handle via CloseHandle (in function
> pg_signal_dispatch_thread). This closes the last instance of the named pipe.
> The named pipe is gone. There is no more named pipe. The signal handling
> thread was not yet scheduled by the operating system for execution and thus
> didn't have an opportunity to call CreateNamedPipe.
> 6. Another client (or the same one, it doesn't matter) tries to write to the
> named pipe via CallNamedPipe. The call returns ERROR_FILE_NOT_FOUND, because
> the named pipe is gone. The client believes the backend is gone and it
> removes the entry from pg_listener.
> 7. The signal handling thread (in function pg_signal_thread) is finally
> scheduled for execution and it calls CreateNamedPipe. We now have an
> instance of the named pipe available.
>
> So we end up with the server backend alive, the named pipe is there, but the
> row is gone from pg_listener. This is easy to reproduce under Windows. I
> used the scripts posted by Steve Marshall in a similar thread from
> 01/15/2009 and the problem appears within one minute all the time. For
> testing I used a Windows XP machine with 2 cores and 2GB of RAM. The CPU
> usage was over 70% during the trials.

Thanks for a *very* detailed analysis. I've finally managed to
reproduce this problem, and test your fix.

Interestingly enough, the race condition is copied from Microsoft's
example of how to write a multi-threaded pipe server. That doesn't
make it any less real of course :-)

> The solution is to create a new instance of the named pipe before launching
> the signal dispatch thread. This means changing the code in
> src/backend/port/win32/signal.c to look like this:

I have made some minor stylistic changes to this, such as some missing
error reporting, and added a lot of comments to explain why we do
this, and applied.

If you can, please verify that the tip of CVS for your version solves
the problem in your real world scenario, and not just my reproduction
test case.

> This will guarantee that we have an instance of the named pipe available at
> any given moment. If we do this, we can also remove the 3 tries loop from
> src/port/kill.c:

I have removed these on HEAD only, and I'm leaving them in the
backbranches just to be sure there's nothing else hiding around there.

> As a note, the original code has a timeout of 1000 milliseconds specified in
> the call to CallNamedPipe in kill.c. That timeout doesn't help for this bug,
> because it is the timeout that the client will wait for an instance of the
> named pipe to be available for communication. This means it helps in the
> case when we have one client already using that instance to read/write to
> the server and this second client will wait until the first one finishes. It
> does not help if there is no instance at all. In that case the CallNamedPipe
> returns immediately with ERROR_FILE_NOT_FOUND.

That's indeed true. I guess the fact that it worked with 3 was that it
pushed the likelihood of getting context-switched up far enough so the
race wouldn't show up :)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-01-31 17:24:39 Re: odd output in initdb
Previous Message Magnus Hagander 2010-01-31 17:02:56 Re: Pathological regexp match