Re: Performance considerations for very heavy INSERT traffic

From: Alex Turner <armtuk(at)gmail(dot)com>
To: blblack(at)gmail(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance considerations for very heavy INSERT traffic
Date: 2005-09-12 21:45:21
Message-ID: 33c6269f05091214459a87345@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Split your system into multiple partitions of RAID 10s.For max performance,
ten drive RAID 10 for pg_xlog (This will max out a PCI-X bus) on Bus A,
multiple 4/6Drive RAID 10s for tablespaces on Bus B. For max performance I
would recommend using one RAID 10 for raw data tables, one for aggregate
tables and one for indexes. More RAM will only help you with queries against
your data, if you are pre-aggregating, then you may not need all that much
RAM.

You can easily get 100 tansacts per second with even less hardware with a
little data partitioning.

Choose your controller carefully as many don't co-operate with linux well.

Alex Turner
NetEconomist

On 9/12/05, Brandon Black <blblack(at)gmail(dot)com> wrote:
>
>
> I'm in the process of developing an application which uses PostgreSQL for
> data storage. Our database traffic is very atypical, and as a result it has
> been rather challenging to figure out how to best tune PostgreSQL on what
> development hardware we have, as well as to figure out exactly what we
> should be evaluating and eventually buying for production hardware.
>
> The vast, overwhelming majority of our database traffic is pretty much a
> non-stop stream of INSERTs filling up tables. It is akin to data
> acquisition. Several thousand clients are sending once-per-minute updates
> full of timestamped numerical data at our central server, which in turn
> performs INSERTs into several distinct tables as part of the transaction for
> that client. We're talking on the order of ~100 transactions per second,
> each containing INSERTs to multiple tables (which contain only integer and
> floating point columns and a timestamp column - the primary key (and only
> index) is on a unique integer ID for the client and the timestamp). The
> transaction load is spread evenly over time by having the clients send their
> per-minute updates at random times rather than on the exact minute mark.
>
> There will of course be users using a web-based GUI to extract data from
> these tables and display them in graphs and whatnot, but the SELECT query
> traffic will always be considerably less frequent and intensive than the
> incessant INSERTs, and it's not that big a deal if the large queries take a
> little while to run.
>
> This data also expires - rows with timestamps older than X days will be
> DELETEd periodically (once an hour or faster), such that the tables will
> reach a relatively stable size (pg_autovacuum is handling vacuuming for now,
> but considering our case, we're thinking of killing pg_autovacuum in favor
> of having the periodic DELETE process also do a vacuum of affected tables
> right after the DELETE, and then have it vacuum the other low traffic tables
> once a day while it's at it).
>
> There is an aggregation layer in place which proxies the inbound data from
> the clients into a small(er) number of persistent postgresql backend
> processes. Right now we're doing one aggregator per 128 clients (so instead
> of 128 seperate database connections over the course of a minute for a small
> transaction each, there is a single database backend that is constantly
> committing transactions at a rate of ~ 2/second). At a test load of ~1,000
> clients, we would have 8 aggregators running and 8 postgresql backends.
> Testing has seemed to indicate we should aggregate even harder - the planned
> production load is ~5,000 clients initially, but will grow to almost double
> that in the not-too-distant future, and that would mean ~40 backends at 128
> clients each initially. Even on 8 cpus, I'm betting 40 concurrent backends
> doing 2 tps is much worse off than 10 backends doing 8 tps.
>
> Test hardware right now is a dual Opteron with 4G of ram, which we've
> barely gotten 1,000 clients running against. Current disk hardware in
> testing is whatever we could scrape together (4x 3-ware PCI hardware RAID
> controllers, with 8 SATA drives in a RAID10 array off of each - aggregated
> up in a 4-way stripe with linux md driver and then formatted as ext3 with an
> appropriate stride parameter and data=writeback). Production will hopefully
> be a 4-8-way Opteron, 16 or more G of RAM, and a fiberchannel hardware raid
> array or two (~ 1TB available RAID10 storage) with 15krpm disks and
> battery-backed write cache.
>
> I know I haven't provided a whole lot of application-level detail here,
> but does anyone have any general advice on tweaking postgresql to deal with
> a very heavy load of concurrent and almost exclusively write-only
> transactions? Increasing shared_buffers seems to always help, even out to
> half of the dev box's ram (2G). A 100ms commit_delay seemed to help, but
> tuning it (and _siblings) has been difficult. We're using 8.0 with the
> default 8k blocksize, but are strongly considering both developing against
> 8.1 (seems it might handle the heavy concurrency better), and re-compiling
> with 32k blocksize since our storage arrays will inevitably be using fairly
> wide stripes. Any advice on any of this (other than drop the project while
> you're still a little bit sane)?
>
> --Brandon
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Brandon Black 2005-09-12 22:02:30 Re: Performance considerations for very heavy INSERT traffic
Previous Message PFC 2005-09-12 21:24:24 Re: Performance considerations for very heavy INSERT traffic