Skip site navigation (1) Skip section navigation (2)

Re: sustained update load of 1-2k/sec

From: Bob Ippolito <bob(at)redivi(dot)com>
To: Mark Cotner <mcotner(at)yahoo(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: sustained update load of 1-2k/sec
Date: 2005-08-19 11:28:07
Message-ID: 9852B15E-8F9D-46F5-B4B4-9EAAE26F1AF7@redivi.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Aug 19, 2005, at 12:14 AM, Mark Cotner wrote:

> Excellent feedback.  Thank you.  Please do keep in mind I'm storing  
> the
> results of SNMP queries.  The majority of the time each thread is  
> in a wait
> state, listening on a UDP port for return packet.  The number of  
> threads is
> high because in order to sustain poll speed I need to minimize the  
> impact of
> timeouts and all this waiting for return packets.

Asynchronous IO via select/poll/etc. basically says: "given these 100  
sockets, wake me up when any of them has something to tell me, or  
wake me up anyway in N milliseconds".  From one thread, you can  
usually deal with thousands of connections without breaking a sweat,  
where with thread-per-connection you have so much overhead just for  
the threads that you probably run out of RAM before your network is  
throttled.  The reactor pattern basically just abstracts this a bit  
so that you worry about what do to when the sockets have something to  
say, and also allow you to schedule timed events, rather than having  
to worry about how to implement that correctly *and* write your  
application.

With 100 threads you are basically invoking a special-case of the  
same mechanism that only looks at one socket, but this makes for 100  
different data structures that end up in both userspace and kernel  
space, plus the thread stacks (which can easily be a few megs each)  
and context switching when any of them wakes up..  You're throwing a  
lot of RAM and CPU cycles out the window by using this design.

Also, preemptive threads are hard.

> I had intended to have a fallback plan which would build a thread  
> safe queue
> for db stuffs, but the application isn't currently architected that  
> way.
> It's not completely built yet so now is the time for change.  I hadn't
> thought of building up a batch of queries and creating a  
> transaction from
> them.

It should be *really* easy to just swap out the implementation of  
your "change this record" function with one that simply puts its  
arguments on a queue, with another thread that gets them from the  
queue and actually does the work.

> I've been looking into memcached as a persistent object store as  
> well and
> hadn't seen the reactor pattern yet.  Still trying to get my puny  
> brain
> around that one.

memcached is RAM based, it's not persistent at all... unless you are  
sure all of your nodes will be up at all times and will never go  
down.  IIRC, it also just starts throwing away data once you hit its  
size limit.  If course, this isn't really any different than MySQL's  
MyISAM tables if you hit the row limit, but I think that memcached  
might not even give you an error when this happens.  Also, memcached  
is just key/value pairs over a network, not much of a database going  
on there.

If you can fit all this data in RAM and you don't care so much about  
the integrity, you might not benefit much from a RDBMS at all.   
However, I don't really know what you're doing with the data once you  
have it so I might be very wrong here...

-bob

>
> Again, thanks for the help.
>
> 'njoy,
> Mark
>
>
> On 8/19/05 5:09 AM, "Bob Ippolito" <bob(at)redivi(dot)com> wrote:
>
>
>>
>> On Aug 18, 2005, at 10:24 PM, Mark Cotner wrote:
>>
>>
>>> I'm currently working on an application that will poll
>>> thousands of cable modems per minute and I would like
>>> to use PostgreSQL to maintain state between polls of
>>> each device.  This requires a very heavy amount of
>>> updates in place on a reasonably large table(100k-500k
>>> rows, ~7 columns mostly integers/bigint).  Each row
>>> will be refreshed every 15 minutes, or at least that's
>>> how fast I can poll via SNMP.  I hope I can tune the
>>> DB to keep up.
>>>
>>> The app is threaded and will likely have well over 100
>>> concurrent db connections.  Temp tables for storage
>>> aren't a preferred option since this is designed to be
>>> a shared nothing approach and I will likely have
>>> several polling processes.
>>>
>>
>> Somewhat OT, but..
>>
>> The easiest way to speed that up is to use less threads.  You're
>> adding a whole TON of overhead with that many threads that you just
>> don't want or need.  You should probably be using something event-
>> driven to solve this problem, with just a few database threads to
>> store all that state.  Less is definitely more in this case.  See
>> <http://www.kegel.com/c10k.html> (and there's plenty of other
>> literature out there saying that event driven is an extremely good
>> way to do this sort of thing).
>>
>> Here are some frameworks to look at for this kind of network code:
>> (Python) Twisted - <http://twistedmatrix.com/>
>> (Perl) POE - <http://poe.perl.org/>
>> (Java) java.nio (not familiar enough with the Java thing to know
>> whether or not there's a high-level wrapper)
>> (C++) ACE - <http://www.cs.wustl.edu/~schmidt/ACE.html>
>> (Ruby) IO::Reactor - <http://www.deveiate.org/code/IO-Reactor.html>
>> (C) libevent - <http://monkey.org/~provos/libevent/>
>>
>> .. and of course, you have select/poll/kqueue/WaitNextEvent/whatever
>> that you could use directly, if you wanted to roll your own solution,
>> but don't do that.
>>
>> If you don't want to optimize the whole application, I'd at least
>> just push the DB operations down to a very small number of
>> connections (*one* might even be optimal!), waiting on some kind of
>> thread-safe queue for updates from the rest of the system.  This way
>> you can easily batch those updates into transactions and you won't be
>> putting so much unnecessary synchronization overhead into your
>> application and the database.
>>
>> Generally, once you have more worker threads (or processes) than
>> CPUs, you're going to get diminishing returns in a bad way, assuming
>> those threads are making good use of their time.
>>
>> -bob
>>
>>
>
>
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


In response to

pgsql-performance by date

Next:From: Kari LavikkaDate: 2005-08-19 11:34:47
Subject: Re: Finding bottleneck
Previous:From: Mark CotnerDate: 2005-08-19 10:14:54
Subject: Re: sustained update load of 1-2k/sec

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group