Re: Spinlock performance improvement proposal

From: mlw <markw(at)mohawksoft(dot)com>
To: Chamanya <chamanya(at)yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spinlock performance improvement proposal
Date: 2001-09-29 15:00:06
Message-ID: 3BB5E1F6.75FD5A0@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chamanya wrote:
>
> On Thursday 27 September 2001 04:09, you wrote:
> > This depends on your system. Solaris has a huge difference between
> > thread and process context switch times, whereas Linux has very little
> > difference (and in fact a Linux process context switch is about as
> > fast as a Solaris thread switch on the same hardware--Solaris is just
> > a pig when it comes to process context switching).
>
> I have never worked on any big systems but from what (little) I have seen, I
> think there should be a hybrid model.
>
> This whole discussion started off, from poor performance on SMP machines. If
> I am getting this correctly, threads can be spread on multiple CPUs if
> available but process can not.

Different processes will be on handled evenly across all CPUs in an SMP
machine, unless you set process affinity for a process and a CPU.
>
> So I would suggest to have threaded approach for intensive tasks such as
> sorting/searching etc. IMHO converting entire paradigm to thread based is a
> huge task and may not be required in all cases.

Dividing a query into multiple threads is an amazing task. I wish I had a
couple years and someone willing to pay me to try it.

>
> I think of an approach. Threads are created when they are needed but they
> are kept dormant when not needed. So that there is no recreation overhead(if
> that's a concern). So at any given point of time, one back end connection has
> as many threads as number of CPUs. More than that may not yield much of
> performance improvement. Say a big task like sorting is split and given to
> different threads so that it can use them all.

This is a huge undertaking, and quite frankly, if I understand PostgreSQL, a
complete redesign of the entire system.
>
> It should be easy to switch the threading function and arguments on the fly,
> restricting number of threads and there will not be much of thread switching
> as each thread handles different parts of task and later the results are
> merged.

That is not what I would consider easy.

>
> Number of threads should be equal to or twice that of number of CPUs. I don't
> think more than those many threads would yield any performance improvement.

That isn't true at all.

One of the problems I see when when people discuss performance on an SMP
machine, is that they usually think from the perspective of a single task. If
you are doing data mining, one sql query may take a very long time. Which may
be a problem, but in the grander scheme of things there are usually multiple
concurrent performance issues to be considered. Threading the back end for
parallel query processing will probably not help this. More often than not a
database has much more to do than one thing at a time.

Also, if you are threading query processing, you have to analyze what your
query needs to do with the threads. If your query is CPU bound, then you will
want to use fewer threads, if your query is I/O bound, you should have as many
threads as you have I/O requests, and have each thread block on the I/O.

>
> And with this approach we can migrate one functionality at a time to threaded
> one, thus avoiding big effort at any given time.

Perhaps I am being over dramatic, but I have moved a number of systems from
fork() to threaded (for ports to Windows NT from UNIX), and if my opinion means
anything on this mailing list, I STRONGLY urge against it. PostgreSQL is a huge
system, over a decade old. The original developers are no longer working on it,
and in fact, probably wouldn't recognize it. There are nooks and crannys that
no one knows about.

It has also been my experience going from separate processes to separate
threads does not do much for performance, simply because the operation of your
system does not change, only the methods by which you share memory. If you want
to multithread a single query, that's a different story and a good R&D project
in itself.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message peace_flower 2001-09-29 16:19:01 Elephant, Horse and Hare (Rabbit) : Oracle, PostgreSQL and MySQL !
Previous Message Tom Lane 2001-09-29 14:25:18 Re: Spinlock performance improvement proposal