Re: COPY FROM performance improvements

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY FROM performance improvements
Date: 2005-08-10 04:48:02
Message-ID: BF1ED512.C208%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

Tom,

> As best I can tell, my version of CopyReadAttributes is significantly
> quicker than Alon's, approximately balancing out the fact that my
> version of CopyReadLine is slower. I did the latter first, and would
> now be tempted to rewrite it in the same style as CopyReadAttributes,
> ie one pass of memory-to-memory copy using pointers rather than buffer
> indexes.

I think you are right, with the exception that Alon's results prove out that
the net result of your patch is 20% slower than his.

I think with your speedup of CopyReadAttributes and some additional work on
CopyReadLine the net result could be 50% faster than Alon's patch.

The key thing that is missing is the lack of micro-parallelism in the
character processing in this version. By "inverting the loop", or putting
the characters into a buffer on the outside, then doing fast character
scanning inside with special "fix-up" cases, we exposed long runs of
pipeline-able code to the compiler.

I think there is another way to accomplish the same thing and still preserve
the current structure, but it requires "strip mining" the character buffer
into chunks that can be processed with an explicit loop to check for the
different characters. While it may seem artificial (it is), it will provide
the compiler with the ability to pipeline the character finding logic over
long runs. The other necessary element will have to avoid pipeline stalls
from the "if" conditions as much as possible.

Anyway, thanks for reviewing this code and improving it - it's important to
bring speed increases to our collective customer base. With Bizgres, we're
not satisfied with 12 MB/s, we won't stop until we saturate the I/O bus, so
we may get more extreme with the code than seems reasonable for the general
audience.

- Luke

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Qingqing Zhou 2005-08-10 05:47:50 Re: Fix oversight in pts_error_callback()
Previous Message Luke Lonergan 2005-08-10 04:39:55 Re: COPY FROM performance improvements

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Poe 2005-08-10 06:49:07 Re: Table locking problems?
Previous Message Luke Lonergan 2005-08-10 04:39:55 Re: COPY FROM performance improvements