Re: Load experimentation

From: Craig James <craig_james(at)emolecules(dot)com>
To: Ben Brehmer <benbrehmer(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Load experimentation
Date: 2009-12-07 18:50:17
Message-ID: 4B1D4E69.1010402@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ben Brehmer wrote:
> Hello All,
>
> I'm in the process of loading a massive amount of data (500 GB). After
> some initial timings, I'm looking at 260 hours to load the entire 500GB.

You don't say how you are loading the data, so there's not much to go on. But generally, there are two primary ways to speed things up:

1. Group MANY inserts into a single transaction. If you're doing a row-at-a-time, it will be very slow. The "sweet spot" seems to be somewhere between 100 and 1000 inserts in a single transaction. Below 100, you're still slowing things down, above 1000, it probably won't make much difference.

2. Use the COPY command. This requires you to format your data into the form that COPY uses. But it's VERY fast.

Craig

> 10 days seems like an awfully long time so I'm searching for ways to
> speed this up. The load is happening in the Amazon cloud (EC2), on a
> m1.large instance:
> -7.5 GB memory
> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
> -64-bit platform
>
>
> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
> modifications I have made are as follows:
>
> shared_buffers = 786432
> work_mem = 10240
> maintenance_work_mem = 6291456
> max_fsm_pages = 3000000
> wal_buffers = 2048
> checkpoint_segments = 200
> checkpoint_timeout = 300
> checkpoint_warning = 30
> autovacuum = off
>
>
> There are a variety of instance types available in the Amazon cloud
> (http://aws.amazon.com/ec2/instance-types/), including high memory and
> high CPU. High memory instance types come with 34GB or 68GB of memory.
> High CPU instance types have a lot less memory (7GB max) but up to 8
> virtual cores. I am more than willing to change to any of the other
> instance types.
>
> Also, there is nothing else happening on the loading server. It is
> completely dedicated to the load.
>
> Any advice would be greatly appreciated.
>
> Thanks,
>
> Ben
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-12-07 19:04:02 Re: RAID card recommendation
Previous Message Ben Brehmer 2009-12-07 18:45:16 Re: Load experimentation