Skip site navigation (1) Skip section navigation (2)

Re: Getting even more insert performance (250m+rows/day)

From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: "'Daniel J(dot) Luke'" <dluke(at)geeklair(dot)net>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Getting even more insert performance (250m+rows/day)
Date: 2006-05-24 20:02:44
Message-ID: 01f701c67f6d$0578bf40$8300a8c0@tridecap.com (view raw or flat)
Thread:
Lists: pgsql-performance
If you can live with possible database corruption, you could try turning
Fsync off.  For example if you could just reinsert the data on the off
chance a hardware failure corrupts the database, you might get a decent
improvement.

Also have you tried creating the index after you have inserted all your
data?  (Or maybe copy already disables the indexes while inserting?)



> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org 
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of 
> Daniel J. Luke
> Sent: Wednesday, May 24, 2006 2:45 PM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Getting even more insert performance 
> (250m+rows/day)
> 
> 
> I have a system that currently inserts ~ 250 million rows per day (I  
> have about 10k more raw data than that, but I'm at the limit of my  
> ability to get useful insert performance out of postgres).
> 
> Things I've already done that have made a big difference:
> - modified postgresql.conf shared_buffers value
> - converted to COPY from individual insert statements
> - changed BLCKSZ to 32768
> 
> I currently get ~35k/sec inserts on a table with one index (~70k/sec  
> inserts if I don't have any indexes).
> 
> The indexed field is basically a time_t (seconds since the epoch),  
> autovacuum is running (or postgres would stop choosing to use the  
> index). The other fields have relatively lower cardinality.
> 
> Each days worth of data gets inserted into its own table so that I  
> can expire the data without too much effort (since drop table 
> is much  
> faster than running a delete and then vacuum).
> 
> I would really like to be able to have 1 (or 2) more indexes on the  
> table since it takes a while for a sequential scan of 
> 250million rows  
> to complete, but CPU time goes way up.
> 
> In fact, it looks like I'm not currently IO bound, but CPU-bound. I  
> think some sort of lazy-index generation (especially if it could be  
> parallelized to use the other processors/cores that currently sit  
> mostly idle) would be a solution. Is anyone working on 
> something like  
> this? Any other ideas? Where should I look if I want to start to  
> think about creating a new index that would work this way (or am I  
> just crazy)?
> 
> Thanks for any insight!
> 
> --
> Daniel J. Luke
> +========================================================+
> | *---------------- dluke(at)geeklair(dot)net ----------------* |
> | *-------------- http://www.geeklair.net -------------* |
> +========================================================+
> |   Opinions expressed are mine and do not necessarily   |
> |          reflect the opinions of my employer.          |
> +========================================================+
> 
> 
> 


In response to

Responses

pgsql-performance by date

Next:From: Steinar H. GundersonDate: 2006-05-24 20:03:49
Subject: Re: Getting even more insert performance (250m+rows/day)
Previous:From: Daniel J. LukeDate: 2006-05-24 19:45:17
Subject: Getting even more insert performance (250m+rows/day)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group