Re: Configuration Advice

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Steve <cheetah(at)tanabi(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Configuration Advice
Date: 2007-01-17 23:43:44
Message-ID: 53CA867C-4255-4504-9F51-6B5B42F925A4@fastcrypt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 17-Jan-07, at 3:41 PM, Steve wrote:

> Hey there;
>
> I've been lurking on this list awhile, and I've been working with
> postgres for a number of years so I'm not exactly new to this. But
> I'm still having trouble getting a good balance of settings and I'd
> like to see what other people think. We may also be willing to
> hire a contractor to help tackle this problem if anyone is interested.
>
> I've got an application here that runs large (in terms of length --
> the queries have a lot of conditions in them) queries that can
> potentially return millions of rows but on average probably return
> tens of thousands of rows. It's read only for most of the day, and
> pretty much all the queries except one are really fast.
>
> However, each night we load data from a legacy cobol system into
> the SQL system and then we summarize that data to make the reports
> faster. This load process is intensely insert/update driven but
> also has a hefty amount of selects as well. This load process is
> taking ever longer to complete.
>
>
> SO ... our goal here is to make this load process take less time.
> It seems the big part is building the big summary table; this big
> summary table is currently 9 million rows big. Every night, we
> drop the table, re-create it, build the 9 million rows of data (we
> use COPY to put hte data in when it's prepared, not INSERT), and
> then build the indexes on it -- of which there are many.
> Unfortunately this table gets queried in a lot of different ways
> and needs these indexes; also unfortunately, we have operator class
> indexes to support both ASC and DESC sorting on columns so these
> are for all intents and purposes duplicate but required under
> Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this
> still a requirement?)
>
> Building these indexes takes forever! It's a long grind through
> inserts and then building the indexes takes a hefty amount of time
> too. (about 9 hours). Now, the application is likely part at
> fault, and we're working to make it more efficient, but it has
> nothing to do with the index building time. I'm wondering what we
> can do to make this better if anything; would it be better to leave
> the indexes on? It doesn't seem to be. Would it be better to use
> INSERTs instead of copies? Doesn't seem to be.
>
>
> Anyway -- ANYTHING we can do to make this go faster is
> appreciated :) Here's some vital statistics:
>
> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using
> SCSI discs. The disc configuration seems to be a good one, it's
> the best of all the ones we've tested so far.
>
The basic problem here is simply writing all the data to disk. you
are building 9M rows of data plus numerous index's. How much data are
you actually writing to the disk. Try looking at iostat while this is
going on.

My guess is you are maxing out the disk write speed.
> - The load process itself takes about 6 gigs of memory, the rest is
> free for postgres because this is basically all the machine does.
>
> - If this was your machine and situation, how would you lay out the
> emmory settings? What would you set the FSM to? Would you leave
> teh bgwriter on or off? We've already got FSYNC off because "data
> integrity" doesn't matter -- this stuff is religeously backed up
> and we've got no problem reinstalling it. Besides, in order for
> this machine to go down, data integrity of the DB is the least of
> the worries :)
>
> Do wal_buffers/full_page_writes matter of FSYNC is off? If so,
> what settings? What about checkpoints?
>
Not reallly, I'd have WAL buffers write to a ram disk
> Any finally, any ideas on planner constants? Here's what I'm using:
>
> seq_page_cost = 0.5 # measured on an arbitrary
> scale
> random_page_cost = 1.0 # same scale as above
> cpu_tuple_cost = 0.001 # same scale as above
> cpu_index_tuple_cost = 0.0001 # same scale as above
> cpu_operator_cost = 0.00025 # same scale as above
> effective_cache_size = 679006
>

as a general rule make shared buffers about 25% of free mem,
effective cache 75% but with a write intensive load like you have I
think the first thing to look at is write speed.
> I really don't remember how I came up with that
> effective_cache_size number....
>
>
> Anyway... any advice would be appreciated :)
>
>
> Steve
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steve 2007-01-17 23:55:20 Re: Configuration Advice
Previous Message Adam Rich 2007-01-17 23:37:48 Re: Configuration Advice