Re: COPY FROM performance improvements

From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY FROM performance improvements
Date: 2005-08-10 00:41:18
Message-ID: BF1E9B3E.803C%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

I did some performance checks after the recent code commit.

The good news is that the parsing speed of COPY is now MUCH faster, which is
great. It is about 5 times faster - about 100MB/sec on my machine
(previously 20MB/sec at best, usually less).

The better news is that my original patch parsing speed reaches 120MB/sec,
about 20MB/sec faster than the code that's now in CVS. This can be
significant for the long scheme of things and for large data sets. Maybe we
can improve the current code a bit more to reach this number.

I performed those measurement by executing *only the parsing logic* of the
COPY pipeline. All data conversion (functioncall3(string...)) and tuple
handling (form_heaptuple etc...) and insertion were manually disabled. So
the only code measured is reading from disk and parsing to the attribute
level.

Cheers,
Alon.

On 8/7/05 1:21 AM, "Luke Lonergan" <llonergan(at)greenplum(dot)com> wrote:

> Tom,
>
> On 8/6/05 9:08 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>>>> I had some difficulty in generating test cases that weren't largely
>>>> I/O-bound, but AFAICT the patch as applied is about the same speed
>>>> as what you submitted.
>>
>>> You achieve the important objective of knocking the parsing stage down a
>>> lot, but your parsing code is actually about 20% slower than Alon's.
>>
>> I would like to see the exact test case you are using to make this
>> claim; the tests I did suggested my code is the same speed or faster.
>
> I showed mine - you show yours :-) Apparently our e-mail crossed.
>
>> As best I can tell, my version of CopyReadAttributes is significantly
>> quicker than Alon's, approximately balancing out the fact that my
>> version of CopyReadLine is slower. I did the latter first, and would
>> now be tempted to rewrite it in the same style as CopyReadAttributes,
>> ie one pass of memory-to-memory copy using pointers rather than buffer
>> indexes.
>
> See previous timings - looks like Alon's parsing is substantially faster.
> However, I'd like him to confirm by running with the "shunt" placed at
> different stages, in this case between parse and attribute conversion (not
> attribute parse).
>
>> BTW, late today I figured out a way to get fairly reproducible
>> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
>> the actual inserts, thus:
>>
>> create table foo ...
>> create function noway() returns trigger as
>> 'begin return null; end' language plpgsql;
>> create trigger noway before insert on foo
>> for each row execute procedure noway();
>> then repeat:
>> copy foo from '/tmp/foo.data';
>
> Cool! That's a better way than hacking code and inserting shunts.
>
> Alon will likely hit this tomorrow.
>
> - Luke
>

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2005-08-10 01:01:38 Re: COPY FROM performance improvements
Previous Message Tom Lane 2005-08-09 22:50:01 Re: Remove all trace of EXPLAIN EXECUTE

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Dunstan 2005-08-10 01:01:38 Re: COPY FROM performance improvements
Previous Message John A Meinel 2005-08-09 22:30:17 Re: Table locking problems?