Load is chugging along. We've optimized our postgres conf as much as
possible but are seeing the inevitable I/O bottleneck. I had the same
thought as you (converting inserts into copy's) a while back but
unfortunately each file has many inserts into many different tables.
Potentially I could rip through this with a little MapReduce job on
50-100 nodes, which is still something I might do.
One thought we are playing with was taking advantage of 4 x 414GB EBS
devices in a RAID0 configuration. This would spread disk writes across 4
Right now I'm wrapping about 1500 inserts in a transaction block. Since
its an I/O bottlenecks, COPY statements might not give me much advantage.
Its definitely a work in progress :)
On 09/12/2009 5:31 AM, Andy Colson wrote:
> On 12/07/2009 12:12 PM, Ben Brehmer wrote:
>> Hello All,
>> I'm in the process of loading a massive amount of data (500 GB). After
>> some initial timings, I'm looking at 260 hours to load the entire 500GB.
>> 10 days seems like an awfully long time so I'm searching for ways to
>> speed this up. The load is happening in the Amazon cloud (EC2), on a
>> m1.large instance:
>> -7.5 GB memory
>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
>> -64-bit platform
>> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
>> modifications I have made are as follows:
>> shared_buffers = 786432
>> work_mem = 10240
>> maintenance_work_mem = 6291456
>> max_fsm_pages = 3000000
>> wal_buffers = 2048
>> checkpoint_segments = 200
>> checkpoint_timeout = 300
>> checkpoint_warning = 30
>> autovacuum = off
>> There are a variety of instance types available in the Amazon cloud
>> (http://aws.amazon.com/ec2/instance-types/), including high memory and
>> high CPU. High memory instance types come with 34GB or 68GB of memory.
>> High CPU instance types have a lot less memory (7GB max) but up to 8
>> virtual cores. I am more than willing to change to any of the other
>> instance types.
>> Also, there is nothing else happening on the loading server. It is
>> completely dedicated to the load.
>> Any advice would be greatly appreciated.
> I'm kind of curious, how goes the load? Is it done yet? Still
> looking at days'n'days to finish?
> I was thinking... If the .sql files are really nicely formatted, it
> would not be too hard to whip up a perl script to run as a filter to
> change the statements into copy's.
> Each file would have to only fill one table, and only contain inserts,
> and all the insert statements would have to set the same fields. (And
> I'm sure there could be other problems).
> Also, just for the load, did you disable fsync?
In response to
pgsql-performance by date
|Next:||From: Michael Clemmons||Date: 2009-12-10 20:41:08|
|Subject: 8.4.1 ubuntu karmic slow createdb|
|Previous:||From: Robert Haas||Date: 2009-12-10 17:19:44|
|Subject: Re: Fw: Help me put 2 Gigs of RAM to use|