Re: Reasoning behind process instead of thread based

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Reasoning behind process instead of thread based
Date: 2004-11-02 09:48:14
Message-ID: Pine.LNX.4.61.0411021012430.29788@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

[Cc: list minimized]

On Tue, 2 Nov 2004, Neil Conway wrote:

> I don't see the big difference between what Marco is suggesting and user
> threads -- or to be more precise, I think user threads and event-based
> programming are just two sides of the same coin. A user thread just
> represents the state of a computation -- say, a register context and some
> stack. It is exactly that *state* that is passed to a callback function in
> the event-based model. The only difference is that with user threads the
> system manages context for you, whereas the event-based model lets the
> programmer manage it. Which model is better is difficult to say.

Well, the difference is that in a pure event-driven model, you
(the programmer) have full control over what the state is. Any thread
library offers a "general purpose" thread, which may be more than
what you want/need.
Of course, very often userland threads are good implementation of
an even-driven model. Think of GUIs.

The problem is not threads or not. The problem is one thread/process
per session, as opposed to a few specialized threads or one thread per
outstanding query. We can start another "thread" :-) on threads in
general but it would be largely off-topic here.

> Martijn van Oosterhout wrote:
>> 1. non-blocking is nice, but lots of OSes (eg POSIX) don't support it
>> on disk I/O unless you use a completely different interface.
>
> We could implement I/O via something like POSIX AIO or a pool of worker
> threads that do the actual I/O in a synchronous fashion. But yeah, either way
> it's a major change.
>
>> 2. If one of your 'processes' decides to do work for half an hour (say,
>> a really big merge sort), you're stuck.
>
> It would be relatively easy to insert yield points into the code to prevent
> this from occurring. However, preemptive scheduling would come in handy when
> running "foreign" code (e.g. user-defined functions in C).
>
>> I honestly don't think you could really do a much better job of
>> scheduling than the kernel.
>
> I think we could do better than the kernel by taking advantage of
> domain-specific knowledge, I'm just not sure we could beat the kernel by
> enough to make this worth doing.
>
> BTW, I think this thread is really interesting -- certainly more informative
> than a rehash of the usual "processes vs. threads" debate.

Thanks, that was the whole point.

I thought that the even-driven model was well-understood, I personally
consider it an established alternative to the threads/processes one.
I'd do a bad and pointless job in further explaining it. Please let me
just throw a few URLs in...

http://www.usenix.org/events/usenix01/full_papers/chandra/chandra_html/index.html

A random quote to attract readers: :-)

In general, thread-per-connection servers have the drawback of large
forking and context-switching overhead. In addition, the memory usage
due to threads' individual stack space can become huge for handling
large number of concurrent connections. The problem is even more
pronounced if the operating system does not support kernel-level
threads, and the application has to use processes or user-level
threads. It has been shown that thread-based servers do not scale well
at high loads [7]. Hence, many servers are structured as event-based
applications, whose performance is determined by the efficiency of event
notification mechanisms they employ. Pure event-based servers do not
scale to multiprocessor machines, and hence, on SMP machines, hybrid
schemes need to be employed, where we have a multi-threaded server
with each thread using event-handling as a mechanism for servicing
concurrent connections. Even with a hybrid server, the performance of
event-based mechanisms is an important issue. Since efficient event
dispatching is at the core of both event-based and hybrid servers,
we will focus on the former here.

http://www.kegel.com/c10k.html

This paper is very complete, it covers almost all possible techniques
to implement even-driver servers, and it's a very interesting reading
anyway.
Please note that the rationale behind it is the "C10k problem", which
I _don't_ think we're facing here. There are some nice properties
of even-driven servers other than being able to handle 100K connections,
IMHO.

All this started from the priority inversion problem, a few messages ago
on this list. The problem was to 'slow down' a query.
In general, I've been thinking about a not-so-cooperative environment,
which demands for some active measures to limit resources used by a
session (other than the DBA yelling at the (mis)user). Think of high
density web services, with hundreds of sites on the same host.
Even-driven servers easily allow to take full control over the resources
allocated to each session.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Eric E 2004-11-02 14:19:22 Re: Rows created by a stored proc prompt Access' dreaded "write conflict"
Previous Message Richard Huxton 2004-11-02 09:35:30 Re: Postgres Versions / Releases