Re: COPY FROM performance improvements

From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: "Mark Wong" <markw(at)osdl(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY FROM performance improvements
Date: 2005-07-19 21:05:56
Message-ID: BF02B944.7295%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting.

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.

On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> On Thu, 14 Jul 2005 17:22:18 -0700
> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>
>> I revisited my patch and removed the code duplications that were there, and
>> added support for CSV with buffered input, so CSV now runs faster too
>> (although it is not as optimized as the TEXT format parsing). So now
>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>
> Hi Alon,
>
> I'm curious, what kind of system are you testing this on? I'm trying to
> load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
> interested in the results you would expect.
>
> Mark
>

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2005-07-19 21:13:26 Re: [HACKERS] Patch to fix plpython on OS X
Previous Message Michael Fuhr 2005-07-19 20:42:07 Re: [HACKERS] Patch to fix plpython on OS X

Browse pgsql-performance by date

  From Date Subject
Next Message John Mendenhall 2005-07-19 21:05:57 performance decrease after reboot
Previous Message Oliver Crosby 2005-07-19 21:04:04 Re: Looking for tips