Skip site navigation (1) Skip section navigation (2)

Performance considerations for very heavy INSERT traffic

From: Brandon Black <blblack(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance considerations for very heavy INSERT traffic
Date: 2005-09-12 21:04:06
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
I'm in the process of developing an application which uses PostgreSQL for 
data storage. Our database traffic is very atypical, and as a result it has 
been rather challenging to figure out how to best tune PostgreSQL on what 
development hardware we have, as well as to figure out exactly what we 
should be evaluating and eventually buying for production hardware.

The vast, overwhelming majority of our database traffic is pretty much a 
non-stop stream of INSERTs filling up tables. It is akin to data 
acquisition. Several thousand clients are sending once-per-minute updates 
full of timestamped numerical data at our central server, which in turn 
performs INSERTs into several distinct tables as part of the transaction for 
that client. We're talking on the order of ~100 transactions per second, 
each containing INSERTs to multiple tables (which contain only integer and 
floating point columns and a timestamp column - the primary key (and only 
index) is on a unique integer ID for the client and the timestamp). The 
transaction load is spread evenly over time by having the clients send their 
per-minute updates at random times rather than on the exact minute mark.

There will of course be users using a web-based GUI to extract data from 
these tables and display them in graphs and whatnot, but the SELECT query 
traffic will always be considerably less frequent and intensive than the 
incessant INSERTs, and it's not that big a deal if the large queries take a 
little while to run.

This data also expires - rows with timestamps older than X days will be 
DELETEd periodically (once an hour or faster), such that the tables will 
reach a relatively stable size (pg_autovacuum is handling vacuuming for now, 
but considering our case, we're thinking of killing pg_autovacuum in favor 
of having the periodic DELETE process also do a vacuum of affected tables 
right after the DELETE, and then have it vacuum the other low traffic tables 
once a day while it's at it).

There is an aggregation layer in place which proxies the inbound data from 
the clients into a small(er) number of persistent postgresql backend 
processes. Right now we're doing one aggregator per 128 clients (so instead 
of 128 seperate database connections over the course of a minute for a small 
transaction each, there is a single database backend that is constantly 
committing transactions at a rate of ~ 2/second). At a test load of ~1,000 
clients, we would have 8 aggregators running and 8 postgresql backends. 
Testing has seemed to indicate we should aggregate even harder - the planned 
production load is ~5,000 clients initially, but will grow to almost double 
that in the not-too-distant future, and that would mean ~40 backends at 128 
clients each initially. Even on 8 cpus, I'm betting 40 concurrent backends 
doing 2 tps is much worse off than 10 backends doing 8 tps.

Test hardware right now is a dual Opteron with 4G of ram, which we've barely 
gotten 1,000 clients running against. Current disk hardware in testing is 
whatever we could scrape together (4x 3-ware PCI hardware RAID controllers, 
with 8 SATA drives in a RAID10 array off of each - aggregated up in a 4-way 
stripe with linux md driver and then formatted as ext3 with an appropriate 
stride parameter and data=writeback). Production will hopefully be a 4-8-way 
Opteron, 16 or more G of RAM, and a fiberchannel hardware raid array or two 
(~ 1TB available RAID10 storage) with 15krpm disks and battery-backed write 

I know I haven't provided a whole lot of application-level detail here, but 
does anyone have any general advice on tweaking postgresql to deal with a 
very heavy load of concurrent and almost exclusively write-only 
transactions? Increasing shared_buffers seems to always help, even out to 
half of the dev box's ram (2G). A 100ms commit_delay seemed to help, but 
tuning it (and _siblings) has been difficult. We're using 8.0 with the 
default 8k blocksize, but are strongly considering both developing against 
8.1 (seems it might handle the heavy concurrency better), and re-compiling 
with 32k blocksize since our storage arrays will inevitably be using fairly 
wide stripes. Any advice on any of this (other than drop the project while 
you're still a little bit sane)?



pgsql-performance by date

Next:From: PFCDate: 2005-09-12 21:24:24
Subject: Re: Performance considerations for very heavy INSERT traffic
Previous:From: Dave CramerDate: 2005-09-12 15:30:14
Subject: Re: Prepared statement not using index

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group