From: | Scott Carey <scott(at)richrelevance(dot)com> |
---|---|
To: | Ben Brehmer <benbrehmer(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Cc: | Thom Brown <thombrown(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "craig_james(at)emolecules(dot)com" <craig_james(at)emolecules(dot)com>, "kbuckham(at)applocation(dot)net" <kbuckham(at)applocation(dot)net>, "scott(dot)lists(at)enterprisedb(dot)com" <scott(dot)lists(at)enterprisedb(dot)com> |
Subject: | Re: Load experimentation |
Date: | 2009-12-10 23:29:59 |
Message-ID: | C746C477.1B046%scott@richrelevance.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 12/7/09 11:12 AM, "Ben Brehmer" <benbrehmer(at)gmail(dot)com> wrote:
> Thanks for the quick responses. I will respond to all questions in one email:
>
> COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data
> was generated (Hadoop/MapReduce).
If you have control over the MapReduce output, you can have that output
result files in a format that COPY likes.
If you don't have any control over that its more complicated. I use a final
pass Hadoop Map only job to go over the output and insert into postgres
directly from the job, using the :
INSERT INTO <table> VALUES (val1, val2, ... ), (val1, val2, ...), ...
Insert style from Java with about 80 rows per insert statement and a single
transaction for about a thousand of these. This was faster than batch
inserts .
>
> On 07/12/2009 10:39 AM, Thom Brown wrote:
>>
>> 2009/12/7 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
>>
>>>
>>> Ben Brehmer <benbrehmer(at)gmail(dot)com> wrote:
>>>
>>>> -7.5 GB memory
>>>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
>>>> each)
>>>> -64-bit platform
>>>
>>>
>>> What OS?
>>>
>>>> (PostgreSQL 8.1.3)
>>>
>>> Why use such an antiquated, buggy version? Newer versions are
>>> faster.
>>>
>>> -Kevin
>>>
>>
>>
>>
>>
>>
>>
>> I'd agree with trying to use the latest version you can.
>>
>>
>>
>>
>> How are you loading this data? I'd make sure you haven't got any indices,
>> primary keys, triggers or constraints on your tables before you begin the
>> initial load, just add them after. Also use either the COPY command for
>> loading, or prepared transactions. Individual insert commands will just take
>> way too long.
>>
>>
>>
>>
>> Regards
>>
>>
>>
>>
>> Thom
>
From | Date | Subject | |
---|---|---|---|
Next Message | Nikolas Everett | 2009-12-11 01:38:25 | Re: 8.4.1 ubuntu karmic slow createdb |
Previous Message | Andres Freund | 2009-12-10 22:09:03 | Re: 8.4.1 ubuntu karmic slow createdb |