Re: Getting even more insert performance (250m+rows/day)

From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: "'Daniel J(dot) Luke'" <dluke(at)geeklair(dot)net>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Getting even more insert performance (250m+rows/day)
Date: 2006-05-24 20:02:44
Message-ID: 01f701c67f6d$0578bf40$8300a8c0@tridecap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


If you can live with possible database corruption, you could try turning
Fsync off. For example if you could just reinsert the data on the off
chance a hardware failure corrupts the database, you might get a decent
improvement.

Also have you tried creating the index after you have inserted all your
data? (Or maybe copy already disables the indexes while inserting?)

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> Daniel J. Luke
> Sent: Wednesday, May 24, 2006 2:45 PM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Getting even more insert performance
> (250m+rows/day)
>
>
> I have a system that currently inserts ~ 250 million rows per day (I
> have about 10k more raw data than that, but I'm at the limit of my
> ability to get useful insert performance out of postgres).
>
> Things I've already done that have made a big difference:
> - modified postgresql.conf shared_buffers value
> - converted to COPY from individual insert statements
> - changed BLCKSZ to 32768
>
> I currently get ~35k/sec inserts on a table with one index (~70k/sec
> inserts if I don't have any indexes).
>
> The indexed field is basically a time_t (seconds since the epoch),
> autovacuum is running (or postgres would stop choosing to use the
> index). The other fields have relatively lower cardinality.
>
> Each days worth of data gets inserted into its own table so that I
> can expire the data without too much effort (since drop table
> is much
> faster than running a delete and then vacuum).
>
> I would really like to be able to have 1 (or 2) more indexes on the
> table since it takes a while for a sequential scan of
> 250million rows
> to complete, but CPU time goes way up.
>
> In fact, it looks like I'm not currently IO bound, but CPU-bound. I
> think some sort of lazy-index generation (especially if it could be
> parallelized to use the other processors/cores that currently sit
> mostly idle) would be a solution. Is anyone working on
> something like
> this? Any other ideas? Where should I look if I want to start to
> think about creating a new index that would work this way (or am I
> just crazy)?
>
> Thanks for any insight!
>
> --
> Daniel J. Luke
> +========================================================+
> | *---------------- dluke(at)geeklair(dot)net ----------------* |
> | *-------------- http://www.geeklair.net -------------* |
> +========================================================+
> | Opinions expressed are mine and do not necessarily |
> | reflect the opinions of my employer. |
> +========================================================+
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2006-05-24 20:03:49 Re: Getting even more insert performance (250m+rows/day)
Previous Message Daniel J. Luke 2006-05-24 19:45:17 Getting even more insert performance (250m+rows/day)