Re: Reasoning behind process instead of thread based

From: Thomas Hallgren <thhal(at)mailblocks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: nd02tsk(at)student(dot)hig(dot)se, pgsql-general(at)postgresql(dot)org
Subject: Re: Reasoning behind process instead of thread based
Date: 2004-10-27 22:48:51
Message-ID: 418025D3.5090205@mailblocks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Right. Depending on your OS you may be able to catch a signal that
> would kill a thread and keep it from killing the whole process, but
> this still leaves you with a process memory space that may or may not
> be corrupted. Continuing in that situation is not cool, at least not
> according to the Postgres project's notions of reliable software design.
>
There can't be any "may or may not" involved. You must of course know
what went wrong.

It is very common that you either get a null pointer exception (attempt
to access address zero), that your stack will hit a write protected page
(stack overflow), or that you get some sort of arithemtic exception.
These conditions can be trapped and gracefully handled. The signal
handler must be able to check the cause of the exception. This usually
involves stack unwinding and investingating the state of the CPU at the
point where the signal was generated. The process must be terminated if
the reason is not a recognized one.

Out of memory can be managed using thread local allocation areas
(similar to MemoryContext) and killing a thread based on some criteria
when no more memory is available. A criteria could be the thread that
encountered the problem, the thread that consumes the most memory, the
thread that was least recently active, or something else.

> It should be pointed out that when we get a hard backend crash, Postgres
> will forcibly terminate all the backends and reinitialize; which means
> that in terms of letting concurrent sessions keep going, we are not any
> more forgiving than a single-address-space multithreaded server. The
> real bottom line here is that we have good prospects of confining the
> damage done by the failed process: it's unlikely that anything bad will
> happen to already-committed data on disk or that any other sessions will
> return wrong answers to their clients before we are able to kill them.
> It'd be a lot harder to say that with any assurance for a multithreaded
> server.
>
I'm not sure I follow. You will be able to bring all threads of one
process to a halt much faster than you can kill a number of external
processes. Killing the multithreaded process is more like pulling the plug.

Regards,
Thomas Hallgren

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2004-10-27 22:52:28 Re: interval to seconds conversion. How?
Previous Message Robby Russell 2004-10-27 22:45:14 Re: interval to seconds conversion. How?