Re: Spinlock performance improvement proposal

From: mlw <markw(at)mohawksoft(dot)com>
To: "D(dot) Hageman" <dhageman(at)dracken(dot)com>
Cc: Ian Lance Taylor <ian(at)airs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spinlock performance improvement proposal
Date: 2001-09-27 14:02:05
Message-ID: 3BB3315D.EC99FF65@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"D. Hageman" wrote:

> On 26 Sep 2001, Ian Lance Taylor wrote:
> >
> > > Save for the fact that the kernel can switch between threads faster then
> > > it can switch processes considering threads share the same address space,
> > > stack, code, etc. If need be sharing the data between threads is much
> > > easier then sharing between processes.
> >
> > When using a kernel threading model, it's not obvious to me that the
> > kernel will switch between threads much faster than it will switch
> > between processes. As far as I can see, the only potential savings is
> > not reloading the pointers to the page tables. That is not nothing,
> > but it is also not a lot.
>
> It is my understanding that avoiding a full context switch of the
> processor can be of a significant advantage. This is especially important
> on processor architectures that can be kinda slow at doing it (x86). I
> will admit that most modern kernels have features that assist software
> packages utilizing the forking model (copy on write for instance). It is
> also my impression that these do a good job. I am the kind of guy that
> looks towards the future (as in a year, year and half or so) and say that
> processors will hopefully get faster at context switching and more and
> more kernels will implement these algorithms to speed up the forking
> model. At the same time, I see more and more processors being shoved into
> a single box and it appears that the threads model works better on these
> type of systems.

"context" switching happens all the time on a multitasking system. On the x86
processor, a context switch happens when you call into the kernel. You have to go
through a call-gate to get to a lower privilege ring. "context" switching is very
fast. The operating system dictates how heavy or light a process switch is. Under
Linux (and I believe FreeBSD with Linux threads, or version 4.x ) threads and
processes are virtually identical. The only difference is that the virtual memory
pages are not "copy on write." Process vs thread scheduling is also virtually
identical.

If you look to the future, then you should accept that process switching should
become more efficient as the operating systems improve.

>
> > > I can't comment on the "isolate data" line. I am still trying to figure
> > > that one out.
> >
> > Sometimes you need data which is specific to a particular thread.
>
> When you need data that is specific to a thread you use a TSD (Thread
> Specific Data).

Yes, but Postgres has many global variables. The assumption has always been that
it is a stand-alone process with an explicitly shared paradigm, not implicitly.

>
> > Basically, you have to look at every global variable in the Postgres
> > backend, and determine whether to share it among all threads or to
> > make it thread-specific.
>
> Yes, if one was to implement threads into PostgreSQL I would think that
> some re-writing would be in order of several areas. Like I said before,
> give a person a chance to restructure things so future TODO items wouldn't
> be so hard to implement. Personally, I like to stay away from global
> variables as much as possible. They just get you into trouble.

In real live software, software which lives from year to year with active
development, things do get messy. There are always global variables involved in a
program. Efforts, of course, should be made to keep them to a minimum, but the
reality is that they always happen.

Also, the very structure of function calls may need to change when going from a
process model to a threaded model. Functions never before reentrant are now be
reentrant, think about that. That is a huge undertaking. Every single function
may need to be examined for thread safety, with little benefit.

>
> > > That last line is a troll if I every saw it ;-) I will agree that threads
> > > isn't for everything and that it has costs just like everything else. Let
> > > me stress that last part - like everything else. Certain costs exist in
> > > the present model, nothing is - how should we say ... perfect.
> >
> > When writing in C, threading inevitably loses robustness. Erratic
> > behaviour by one thread, perhaps in a user defined function, can
> > subtly corrupt the entire system, rather than just that thread. Part
> > of defensive programming is building barriers between different parts
> > of a system. Process boundaries are a powerful barrier.
>
> I agree with everything you wrote above except for the first line. My
> only comment is that process boundaries are only *truely* a powerful
> barrier if the processes are different pieces of code and are not
> dependent on each other in crippling ways. Forking the same code with the
> bug in it - and only 1 in 5 die - is still 4 copies of buggy code running
> on your system ;-)

This is simply not true. All software has bugs, it is an undeniable fact. Some
bugs are more likely to be hit than others. 5 processes , when one process hits a
bug, that does not mean the other 4 will hit the same bug. Obscure bugs kill
software all the time, the trick is to minimize the impact. Software is not
perfect, assuming it can be is a mistake.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-09-27 14:09:13 Re: multibyte performance
Previous Message Haller Christoph 2001-09-27 12:59:47 Re: Abort transaction on duplicate key error