Re: Reasoning behind process instead of thread based

From: Marco Colombo <marco(at)esi(dot)it>
To: Thomas Hallgren <thhal(at)mailblocks(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Reasoning behind process instead of thread based
Date: 2004-10-29 10:38:29
Message-ID: Pine.LNX.4.61.0410291143040.29788@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 28 Oct 2004, Thomas Hallgren wrote:

> Marco,
>
>> I mean an entirely event driven server. The trickiest part is to handle
>> N-way. On 1-way, it's quite a clear and well-defined model.
>
> You need to clarify this a bit.
>
> You say that the scheduler is in user-space, yet there's only one thread per
> process and one process per CPU. You state that instead of threads, you want
> it to be completely event driven. In essence that would mean serving one
> event per CPU from start to end at any given time. What is an event in this
> case? Where did it come from? How will this system serve concurrent users?

Let's take a look at the bigger picture. We need to serve many clients,
that is many sessions, that is many requests (queries) at the same time.
Since there may be more than one active request, we need to schedule
them in some way. That's what I meant with "session scheduler".

The traditional accept&fork model doesn't handle that directly: by
creating one process per session, it relays on the process scheduler
in the kernel. I state this is suboptimal, both for extra resources
allocated to each session, and for the kernel policies not being
perfectly tailored to the job of scheduling PG sessions (*).
Not to mention the postmaster has almost no control over these policies.

Now, threads help a bit in reducing the per session overhead. But that's
more an implementation detail, and it's _very_ platform specific.
Switching to threads has a great impact on many _details_ of the
server, the benefits depend a lot on the platform, but the model is
just the same, with the same essential problems.
Many big changes for little gain. Let's explore, at least in theory,
the advantages of a completely different model (that implies a lot
of changes too, of course - but for something).

You ask what an event is? An event can be:
- input from a connection (usually a new query);
- notification that I/O needed by a pending query has completed;
- if we don't want a single query starve the server, an alarm of kind
(I think this is a corner case, but still possible;)
- something else I haven't thought about.

At any given moment, there are many pending queries. Most of them
will be waiting for I/O to complete. That's how the server handles
concurrent users.

>
> Regards,
> Thomas Hallgren

(*) They're oriented to general purpose processes. Think of how CPU
usage affects relative priorities. In a DB context, there may be
other criteria of greater significance. Roughly speaking, the larger
the part of the data a single session holds locked, the sooner it should
be completed. The kernel has no knowledge of this. To the kernel,
"big" processes are those that are using a lot of CPU. And the policy is
to slow them down. To a DB, a "big" queries are those that force the most
serialization ("lock a lot"), and they should be completed as soon as
possible.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2004-10-29 10:41:43 Re: Question Regarding Locks
Previous Message M.A. Oude Kotte 2004-10-29 10:12:45 Creating database problem