Skip site navigation (1) Skip section navigation (2)

Re: Load experimentation

From: Craig James <craig_james(at)emolecules(dot)com>
To: Ben Brehmer <benbrehmer(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Load experimentation
Date: 2009-12-07 19:21:22
Message-ID: 4B1D55B2.9080601@emolecules.com (view raw or flat)
Thread:
Lists: pgsql-performance
Ben Brehmer wrote:
> Thanks for the quick responses. I will respond to all questions in one 
> email:
> 
> By "Loading data" I am implying: "psql -U postgres -d somedatabase -f 
> sql_file.sql".  The sql_file.sql contains table creates and insert 
> statements. There are no indexes present nor created during the load.

Although transactions of over 1000 INSERT statements don't speed things up much, they don't hurt either, especially on a new system that nobody is using yet.  Since you're loading from big SQL files using psql, just put a "begin;" at the top of the file and a "commit;" at the bottom.  Unlike Oracle, Postgres even allows CREATE and such to be done inside a transaction.

And BTW, don't forget to ANALYZE when you're all done.

Craig

> 
> OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 
> (Red Hat 4.1.2-44)
> 
> PostgreSQL: I will try upgrading to latest version.
> 
> COPY command: Unfortunately I'm stuck with INSERTS due to the nature 
> this data was generated (Hadoop/MapReduce).
> 
> Transactions: Have started a second load process with chunks of 1000 
> inserts wrapped in a transaction. Its dropped the load time for 1000 
> inserts from 1 Hour to 7 minutes :)
> 
> Disk Setup: Using a single disk Amazon image for the destination 
> (database). Source is coming from an EBS volume. I didn't think there 
> were any disk options in Amazon?
> 
> 
> Thanks!
> 
> Ben
> 
> 
> 
> 
> 
> On 07/12/2009 10:39 AM, Thom Brown wrote:
>> 2009/12/7 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov 
>> <mailto:Kevin(dot)Grittner(at)wicourts(dot)gov>>
>>
>>     Ben Brehmer <benbrehmer(at)gmail(dot)com <mailto:benbrehmer(at)gmail(dot)com>>
>>     wrote:
>>
>>     > -7.5 GB memory
>>     > -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
>>     >    each)
>>     > -64-bit platform
>>
>>     What OS?
>>
>>     > (PostgreSQL 8.1.3)
>>
>>     Why use such an antiquated, buggy version?  Newer versions are
>>     faster.
>>
>>     -Kevin
>>
>>
>>
>> I'd agree with trying to use the latest version you can.
>>
>> How are you loading this data?  I'd make sure you haven't got any 
>> indices, primary keys, triggers or constraints on your tables before 
>> you begin the initial load, just add them after.  Also use either the 
>> COPY command for loading, or prepared transactions.  Individual insert 
>> commands will just take way too long.
>>
>> Regards
>>
>> Thom


In response to

pgsql-performance by date

Next:From: Karl DenningerDate: 2009-12-07 19:47:33
Subject: Re: RAID card recommendation
Previous:From: Ben BrehmerDate: 2009-12-07 19:12:12
Subject: Re: Load experimentation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group