Re: Threads

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "PGHackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Threads
Date: 2003-01-03 20:52:48
Message-ID: D90A5A6C612A39408103E6ECDD77B829408A20@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: mlw [mailto:pgsql(at)mohawksoft(dot)com]
> Sent: Friday, January 03, 2003 12:47 PM
> To: Shridhar Daithankar
> Cc: PGHackers
> Subject: Re: [HACKERS] Threads
>
>
> Please no threading threads!!!
>
> Has anyone calculated the interval and period of "PostgreSQL needs
> threads" posts?
>
> The *ONLY* advantage threading has over multiple processes is
> the time
> and resources used in creating new processes.

Threading is absurdly easier to do portably than fork().

Will you fork() successfully on MVS, VMS, OS/2, Win32?

On some operating systems, thread creation is absurdly faster than
process creation (many orders of magnitude).

> That being said, I admit that creating a threaded program is
> easier than
> one with multiple processes, but PostgreSQL is already there
> and working.
>
> Drawbacks to a threaded model:
>
> (1) One thread screws up, the whole process dies. In a
> multiple process
> application this is not too much of an issue.

If you use C++ you can try/catch and nothing bad happens to anything but
the naughty thread.

> (2) Heap fragmentation. In a long uptime application, such as a
> database, heap fragmentation is an important consideration. With
> multiple processes, each process manages its own heap and what ever
> fragmentation that exists goes away when the connection is closed. A
> threaded server is far more vulnerable because the heap has to manage
> many threads and the heap has to stay active and unfragmented in
> perpetuity. This is why Windows applications usually end up
> using 2G of
> memory after 3 months of use. (Well, this AND memory leaks)

Poorly written applications leak memory. Fragmentation is a legitimate
concern.

> (3) Stack space. In a threaded application they are more
> limits to stack
> usage. I'm not sure, but I bet PostgreSQL would have a problem with a
> fixed size stack, I know the old ODBC driver did.

A single server with 20 threads will consume less total free store
memory and automatic memory than 20 servers. You have to decide how
much stack to give a thread, that's true.

> (4) Lock Contention. The various single points of access in a process
> have to be serialized for multiple threads. heap allocation,
> deallocation, etc all have to be managed. In a multple process model,
> these resources would be separated by process contexts.

Semaphores are more complicated than critical sections. If anything, a
shared memory approach is more problematic and fragile, especially when
porting to multiple operating systems.

> (5) Lastly, why bother? Seriously? Process creation time is an issue
> true, but its an issue with threads as well, just not as bad.
> Anyone who
> is looking for performance should be using a connection pooling
> mechanism as is done in things like PHP.
>
> I have done both threaded and process servers. The threaded
> servers are
> easier to write. The process based severs are more robust. From an
> operational point of view, a "select foo from bar where x >
> y" will take
> he same amount of time.

Probably true. I think a better solution is a server that can start
threads or processes or both. But that's neither here nor there and I'm
certainly not volunteering to write it.

Here is a solution to the dilemma. Make the one who suggests the
feature be the first volunteer on the team that writes it.

Is it a FAQ? If not, it ought to be.

Responses

  • Re: Threads at 2003-01-03 21:45:20 from Greg Copeland

Browse pgsql-hackers by date

  From Date Subject
Next Message D'Arcy J.M. Cain 2003-01-03 20:53:13 Re: python interface
Previous Message mlw 2003-01-03 20:47:22 Re: Threads