Re: COPY FROM performance improvements

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alon Goldshuv <agoldshuv(at)greenplum(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY FROM performance improvements
Date: 2005-06-24 03:58:42
Message-ID: 200506240358.j5O3wga20563@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Sounds great!

---------------------------------------------------------------------------

Alon Goldshuv wrote:
> This is a second iteration of a previous thread that didn't resolve few
> weeks ago. I made some more modifications to the code to make it compatible
> with the current COPY FROM code and it should be more agreeable this time.
>
> The main premise of the new code is that it improves the text data parsing
> speed by about 4-5x, resulting in total improvements that lie between 15% to
> 95% for data importing (higher range gains will occur on large data rows
> without many columns - implying more parsing and less converting to internal
> format). This is done by replacing a char-at-a-time parsing with buffered
> parsing and also using fast scan routines and minimum amount of
> loading/appending into line and attribute buf.
>
> The new code passes both COPY regression tests (copy, copy2) and doesn't
> break any of the others.
>
> It also supports encoding conversions (thanks Peter and Tatsuo and your
> feedback) and the 3 line-end types. Having said that, using COPY with
> different encodings was only minimally tested. We are looking into creating
> new tests and hopefully add them to postgres regression suite one day if
> it's desired by the community.
>
> This new code is improving the delimited data format parsing. BINARY and CSV
> will stay the same and will be executed separately for now (therefore there
> is some code duplication) In the future I plan to write improvements to the
> CSV path too, so that it will be executed without duplication of code.
>
> I am still missing supporting data that uses COPY_OLD_FE (question: what are
> the use cases? When will it be used? Please advise)
>
> I'll send out the patch soon. It's basically there to show that there is a
> way to load data faster. In future releases of the patch it will be more
> complete and elegant.
>
> I'll appreciate any comments/advices.
>
> Alon.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2005-06-24 04:16:44 Re: [PATCHES] O_DIRECT for WAL writes
Previous Message Rod Taylor 2005-06-24 03:49:30 Re: regression failure