Re: Reasoning behind process instead of thread based

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: nd02tsk(at)student(dot)hig(dot)se
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Reasoning behind process instead of thread based
Date: 2004-10-27 16:21:27
Message-ID: 1098894087.31930.62.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2004-10-27 at 09:56, nd02tsk(at)student(dot)hig(dot)se wrote:
> Hello!
>
> I have a couple of final ( I hope, for your sake ) questions regarding
> PostgreSQL.
>
> I understand PostgreSQL uses processes rather than threads. I found this
> statement in the archives:
>
> "The developers agree that multiple processes provide
> more benefits (mostly in stability and robustness) than costs (more
> connection startup costs). The startup costs are easily overcome by
> using connection pooling.
> "
>
> Please explain why it is more stable and robust? More from the above
> statement:

This question shows up every 6 months or so. You might wanna search the
archives (I use google to do that, but YMMV with the postgresql site's
search engine.)

Basically, there are a few issues with threading that pop up their ugly
heads. One: Not all OSes thread libraries are created equal. There
was a nasty bug in one of the BSDs that causes MySQL to crash a couple
years ago that drove them nuts. So programming a threaded
implementation means you have the vagaries of different levels of
quality and robustness of thread libraries to deal with. Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die. Three: Multi-threaded applications can be
prone to race conditions that are VERY hard to troubleshoot, especially
if they occur once every million or so times the triggering event
happens.

On some operating systems, like Windows and Solaris, processes are
expensive, while threads are cheap, so to speak. this is not the case
in Linux or BSD, where the differences are much smaller, and the
multi-process design suffers no great disadvantage.

> "Also, each query can only use one processor; a single query can't be
> executed in parallel across many CPUs. However, several queries running
> concurrently will be spread across the available CPUs."
>
> And it is because of the PostgreSQL process architecture that a query
> can't be executed by many CPU:s right? Although I wonder if this is the
> case in MySQL. It only says in their manual that each connection is a
> thread.

Actually, if it were converted to multi-threaded tomorrow, it would
still be true, because the postgresql engine isn't designed to split off
queries into constituent parts to be executed by seperate threads or
processes. Conversely, if one wished to implement it, one could likely
patch postgresql to break up parts of queries to different child
processes of the current child process (grand child processes so to
speak) that would allow a query to hit multiple CPUs.

> Also, MySQL has a library for embedded aplications, the say:
>
> "We also provide MySQL Server as an embedded multi-threaded library that
> you can link into your application to get a smaller, faster,
> easier-to-manage product."
>
> Do PostgreSQL offer anything similar?

No, because in that design, if your application crashes, so does, by
extension, your database. Now, I'd argue that if I had to choose
between which database to have crash in the middle of transactions, I'd
pick PostgreSQL, it's generally considered a bad thing to have a
database crash mid transaction. PostgreSQL is more robust about crash
recovery, but still...

That's another subject that shows up every x months, an embedded version
of PostgreSQL. Basically, the suggestion is to use something like
SQLlite, which is built to be embedded, and therefore has a much lower
footprint than PostgreSQL could ever hope to achieve. No one wants
their embedded library using up gobs of RAM and disk space when it's
just handling one thread / process doing one thing. It's like
delivering Pizzas with a Ferrari, you could do it, it just eouldn't make
a lot of sense.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dianne Yumul 2004-10-27 16:22:25 Re: Newbie question about casting literals - oracle/postgres
Previous Message Doug McNaught 2004-10-27 16:12:06 Re: Reasoning behind process instead of thread based