Re: Load experimentation

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Ben Brehmer <benbrehmer(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "craig_james(at)emolecules(dot)com" <craig_james(at)emolecules(dot)com>, "kbuckham(at)applocation(dot)net" <kbuckham(at)applocation(dot)net>, "scott(dot)lists(at)enterprisedb(dot)com" <scott(dot)lists(at)enterprisedb(dot)com>
Subject: Re: Load experimentation
Date: 2009-12-10 23:29:59
Message-ID: C746C477.1B046%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 12/7/09 11:12 AM, "Ben Brehmer" <benbrehmer(at)gmail(dot)com> wrote:

> Thanks for the quick responses. I will respond to all questions in one email:
>
> COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data
> was generated (Hadoop/MapReduce).

If you have control over the MapReduce output, you can have that output
result files in a format that COPY likes.

If you don't have any control over that its more complicated. I use a final
pass Hadoop Map only job to go over the output and insert into postgres
directly from the job, using the :

INSERT INTO <table> VALUES (val1, val2, ... ), (val1, val2, ...), ...
Insert style from Java with about 80 rows per insert statement and a single
transaction for about a thousand of these. This was faster than batch
inserts .

>
> On 07/12/2009 10:39 AM, Thom Brown wrote:
>>
>> 2009/12/7 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
>>
>>>
>>> Ben Brehmer <benbrehmer(at)gmail(dot)com> wrote:
>>>
>>>> -7.5 GB memory
>>>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
>>>> each)
>>>> -64-bit platform
>>>
>>>
>>> What OS?
>>>
>>>> (PostgreSQL 8.1.3)
>>>
>>> Why use such an antiquated, buggy version? Newer versions are
>>> faster.
>>>
>>> -Kevin
>>>
>>
>>
>>
>>
>>
>>
>> I'd agree with trying to use the latest version you can.
>>
>>
>>
>>
>> How are you loading this data? I'd make sure you haven't got any indices,
>> primary keys, triggers or constraints on your tables before you begin the
>> initial load, just add them after. Also use either the COPY command for
>> loading, or prepared transactions. Individual insert commands will just take
>> way too long.
>>
>>
>>
>>
>> Regards
>>
>>
>>
>>
>> Thom
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nikolas Everett 2009-12-11 01:38:25 Re: 8.4.1 ubuntu karmic slow createdb
Previous Message Andres Freund 2009-12-10 22:09:03 Re: 8.4.1 ubuntu karmic slow createdb