Re: sustained update load of 1-2k/sec

From: Mark Cotner <mcotner(at)yahoo(dot)com>
To: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: sustained update load of 1-2k/sec
Date: 2005-08-22 09:29:12
Message-ID: BF2F1328.EBAA%mcotner@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks again everyone for the excellent suggestions.

I looked into IO::Reactor, but after a few hours of fiddling decided I was
getting the kind of performance I wanted from using a slightly more than
modest number of threads and decided(due to dev timelines) to come back to
patching the SNMP libraries for Ruby to do async using Reactor later.

I am unfortunately stuck with updates, but I think(with you're suggestions)
I've made it work for me.

MySQL = 1500 updates/sec
PostgreSQL w/10k tx per commit using single thread = 1400 updates/sec
Given the update heavy nature of this table I felt it was necessary to test
during a vacuum. Turns out the hit wasn't that bad . . .
PostgreSQL w/10k tx per commit using a single thread during a vacuum = 1300
updates/sec

100-200 updates/sec is a small price to pay for mature stored procedures,
more stored procedure language options, acid compliance, mvcc, very few if
any corrupt tables(get about 2 a week from MySQL on the 40 DBs I manage),
more crash resistant db(crash about once a month on one of my 40 MySQL dbs),
and replication that actually works for more than a day before quitting for
no apparent reason ;) [/flame off]

For those of you with Cox Communications cable modems look forward to better
customer service and cable plant management. :)

And if anyone's curious here's the app I'm rebuilding/updating
http://www.mysql.com/customers/customer.php?id=16
We won runner up behind Saabre airline reservation system for MySQL app of
the year. Needless to say they weren't too happy when they heard we might
be switching DBs.

'njoy,
Mark

On 8/19/05 1:12 PM, "J. Andrew Rogers" <jrogers(at)neopolitan(dot)com> wrote:

> On 8/19/05 1:24 AM, "Mark Cotner" <mcotner(at)yahoo(dot)com> wrote:
>> I'm currently working on an application that will poll
>> thousands of cable modems per minute and I would like
>> to use PostgreSQL to maintain state between polls of
>> each device. This requires a very heavy amount of
>> updates in place on a reasonably large table(100k-500k
>> rows, ~7 columns mostly integers/bigint). Each row
>> will be refreshed every 15 minutes, or at least that's
>> how fast I can poll via SNMP. I hope I can tune the
>> DB to keep up.
>>
>> The app is threaded and will likely have well over 100
>> concurrent db connections. Temp tables for storage
>> aren't a preferred option since this is designed to be
>> a shared nothing approach and I will likely have
>> several polling processes.
>
>
> Mark,
>
> We have PostgreSQL databases on modest hardware doing exactly what you are
> attempting to (massive scalable SNMP monitoring system). The monitoring
> volume for a single database server appears to exceed what you are trying to
> do by a few orders of magnitude with no scaling or performance issues, so I
> can state without reservation that PostgreSQL can easily handle your
> application in theory.
>
> However, that is predicated on having a well-architected system that
> minimizes resource contention and unnecessary blocking, and based on your
> description you may be going about it a bit wrong.
>
> The biggest obvious bottleneck is the use of threads and massive
> process-level parallelization. As others have pointed out, async queues are
> your friends, as is partitioning the workload horizontally rather than
> vertically through the app stack. A very scalable high-throughput engine
> for SNMP polling only requires two or three threads handling different parts
> of the workload to saturate the network, and by choosing what each thread
> does carefully you can all but eliminate blocking when there is work to be
> done.
>
> We only use a single database connection to insert all the data into
> PostgreSQL, and that process/thread receives its data from a work queue.
> Depending on how you design your system, you can batch many records in your
> queue as a single transaction. In our case, we also use very few updates,
> mostly just inserts, which is probably advantageous in terms of throughput
> if you have the disk for it. The insert I/O load is easily handled, and our
> disk array is a modest 10k SCSI rig. The only thing that really hammers the
> server is when multiple reporting processes are running, which frequently
> touch several million rows each (the database is much larger than the system
> memory), and even this is manageable with clever database design.
>
>
> In short, what you are trying to do is easily doable on PostgreSQL in
> theory. However, restrictions on design choices may pose significant
> hurdles. We did not start out with an ideal system either; it took a fair
> amount of re-engineering to solve all the bottlenecks and problems that pop
> up.
>
> Good luck,
>
> J. Andrew Rogers
> jrogers(at)neopolitan(dot)com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2005-08-22 13:15:10 Re: Finding bottleneck
Previous Message Yves Vindevogel 2005-08-22 09:11:25 Re: (Re)-indexing on updates