Skip site navigation (1) Skip section navigation (2)

Getting even more insert performance (250m+rows/day)

From: "Daniel J(dot) Luke" <dluke(at)geeklair(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Getting even more insert performance (250m+rows/day)
Date: 2006-05-24 19:45:17
Message-ID: CE96DEF6-C455-4BB3-8B57-A78849A519E8@geeklair.net (view raw or flat)
Thread:
Lists: pgsql-performance
I have a system that currently inserts ~ 250 million rows per day (I  
have about 10k more raw data than that, but I'm at the limit of my  
ability to get useful insert performance out of postgres).

Things I've already done that have made a big difference:
- modified postgresql.conf shared_buffers value
- converted to COPY from individual insert statements
- changed BLCKSZ to 32768

I currently get ~35k/sec inserts on a table with one index (~70k/sec  
inserts if I don't have any indexes).

The indexed field is basically a time_t (seconds since the epoch),  
autovacuum is running (or postgres would stop choosing to use the  
index). The other fields have relatively lower cardinality.

Each days worth of data gets inserted into its own table so that I  
can expire the data without too much effort (since drop table is much  
faster than running a delete and then vacuum).

I would really like to be able to have 1 (or 2) more indexes on the  
table since it takes a while for a sequential scan of 250million rows  
to complete, but CPU time goes way up.

In fact, it looks like I'm not currently IO bound, but CPU-bound. I  
think some sort of lazy-index generation (especially if it could be  
parallelized to use the other processors/cores that currently sit  
mostly idle) would be a solution. Is anyone working on something like  
this? Any other ideas? Where should I look if I want to start to  
think about creating a new index that would work this way (or am I  
just crazy)?

Thanks for any insight!

--
Daniel J. Luke
+========================================================+
| *---------------- dluke(at)geeklair(dot)net ----------------* |
| *-------------- http://www.geeklair.net -------------* |
+========================================================+
|   Opinions expressed are mine and do not necessarily   |
|          reflect the opinions of my employer.          |
+========================================================+


Responses

pgsql-performance by date

Next:From: Dave DutcherDate: 2006-05-24 20:02:44
Subject: Re: Getting even more insert performance (250m+rows/day)
Previous:From: kynnDate: 2006-05-24 17:42:46
Subject: Optimizing a huge_table/tiny_table join

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group