Re: Scalability in postgres

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Flavio Henrique Araque Gurgel <flavio(at)4linux(dot)com(dot)br>, Fabrix <fabrixio1(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Scalability in postgres
Date: 2009-06-04 23:04:37
Message-ID: 4A285305.2070103@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Kevin Grittner wrote:
> James Mansion <james(at)mansionfamily(dot)plus(dot)com> wrote:
>
>> Kevin Grittner wrote:
>>
>>> Sure, but the architecture of those products is based around all
>>> the work being done by "engines" which try to establish affinity to
>>> different CPUs, and loop through the various tasks to be done. You
>>> don't get a context switch storm because you normally have the
>>> number of engines set at or below the number of CPUs. The down
>>> side is that they spend a lot of time spinning around queue access
>>> to see if anything has become available to do -- which causes them
>>> not to play nice with other processes on the same box.
>>>
>>>
>> This is just misleading at best.
>>
>
> What part? Last I checked, Sybase ASE and SQL Server worked as I
> described. Those are the products I was describing. Or is it
> misleading to say that you aren't likely to get a context switch storm
> if you keep your active thread count at or below the number of CPUs?
>

Context switch storm is about how the application and runtime implements
concurrent accesses to shared resources, not about the potentials of the
operating system. For example, if threads all spin every time a
condition or event is raised, then yes, a context storm probably occurs
if there are thousands of threads. But, it doesn't have to work that
way. At it's very simplest, this is the difference between "wake one
thread" (which is then responsible for waking the next thread) vs "wake
all threads". This isn't necessarily the best solution - but it is one
alternative. Other solutions might involve waking the *right* thread.
For example, if I know that a particular thread is waiting on my change
and it has the highest priority - perhaps I only need to wake that one
thread. Or, if I know that 10 threads are waiting on my results and can
act on it, I only need to wake these specific 10 threads. Any system
which actually wakes all threads will probably exhibit scaling limitations.

The operating system itself only needs to keep threads in the run queue
if they have work to do. Having thousands of idle thread does not need
to cost *any* cpu time, if they're kept in an idle thread collection
separate from the run queue.

>> I'm sorry, but (in particular) UNIX systems have routinely
>> managed large numbers of runnable processes where the run queue
>> lengths are long without such an issue.
>>
> Well, the OP is looking at tens of thousands of connections. If we
> have a process per connection, how many tens of thousands can we
> handle before we get into problems with exhausting possible pid
> numbers (if nothing else)?
>

This depends if it is 16-bit pid numbers or 32-bit pid numbers. I
believe Linux supports 32-bit pid numbers although I'm not up-to-date on
what the default configurations are for all systems in use today. In
particular, Linux 2.6 added support for the O(1) task scheduler, with
the express requirement of supporting hundreds of thousands of (mostly
idle) threads. The support exists. Is it activated or in proper use? I
don't know.

> I know that if you do use a large number of threads, you have to be
> pretty adaptive. In our Java app that pulls data from 72 sources and
> replicates it to eight, plus feeding it to filters which determine
> what publishers for interfaces might be interested, the Sun JVM does
> very poorly, but the IBM JVM handles it nicely. It seems they use
> very different techniques for the monitors on objects which
> synchronize the activity of the threads, and the IBM technique does
> well when no one monitor is dealing with a very large number of
> blocking threads. They got complaints from people who had thousands
> of threads blocking on one monitor, so they now keep a count and
> switch techniques for an individual monitor if the count gets too
> high.
>
Could be, and if so then Sun JVM should really address the problem.
However, having thousands of threads waiting on one monitor probably
isn't a scalable solution, regardless of whether the JVM is able to
optimize around your usage pattern or not. Why have thousands of threads
waiting on one monitor? That's a bit insane. :-)

You should really only have as 1X or 2X many threads as there are CPUs
waiting on one monitor. Beyond that is waste. The idle threads can be
pooled away, and only activated (with individual monitors which can be
far more easily and effectively optimized) when the other threads become
busy.

> Perhaps something like that (or some other new approach) might
> mitigate the effects of tens of thousands of processes competing for
> for a few resources, but it fundamentally seems unwise to turn those
> loose to compete if requests can be queued in some way.
>

An alternative approach might be: 1) Idle processes not currently
running a transaction do not need to be consulted for their snapshot
(and other related expenses) - if they are idle for a period of time,
they "unregister" from the actively used processes list - if they become
active again, they "register" in the actively used process list, and 2)
Processes could be reusable across different connections - they could
stick around for a period after disconnect, and make themselves
available again to serve the next connection.

Still heavy-weight in terms of memory utilization, but cheap in terms of
other impacts. Without the cost of connection "pooling" in the sense of
requests always being indirect through a proxy of some sort.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message david 2009-06-05 00:51:45 Re: Scalability in postgres
Previous Message Kevin Grittner 2009-06-04 22:33:47 Re: Scalability in postgres