COPY FROM performance improvements

From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: COPY FROM performance improvements
Date: 2005-06-24 02:46:15
Message-ID: BEE0C209.602B%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Here is the patch I described in -hackers yesterday (copied again below).

Alon.

This is a second iteration of a previous thread that didn't resolve few
weeks ago. I made some more modifications to the code to make it compatible
with the current COPY FROM code and it should be more agreeable this time.

The main premise of the new code is that it improves the text data parsing
speed by about 4-5x, resulting in total improvements that lie between 15% to
95% for data importing (higher range gains will occur on large data rows
without many columns - implying more parsing and less converting to internal
format). This is done by replacing a char-at-a-time parsing with buffered
parsing and also using fast scan routines and minimum amount of
loading/appending into line and attribute buf.

The new code passes both COPY regression tests (copy, copy2) and doesn't
break any of the others.

It also supports encoding conversions (thanks Peter and Tatsuo and your
feedback) and the 3 line-end types. Having said that, using COPY with
different encodings was only minimally tested. We are looking into creating
new tests and hopefully add them to postgres regression suite one day if
it's desired by the community.

This new code is improving the delimited data format parsing. BINARY and CSV
will stay the same and will be executed separately for now (therefore there
is some code duplication) In the future I plan to write improvements to the
CSV path too, so that it will be executed without duplication of code.

I am still missing supporting data that uses COPY_OLD_FE (question: what are
the use cases? When will it be used? Please advise)

The patch is basically there to show that there is a
way to load data faster. In future releases of the patch it will be more
complete and elegant.

I'll appreciate any comments/advices.

Alon.

Attachment Content-Type Size
fast_copy_alon_V4.patch application/octet-stream 79.2 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Luke Lonergan 2005-06-24 03:27:53 Re: COPY FROM performance improvements
Previous Message Tatsuo Ishii 2005-06-24 02:43:41 Re: EUC_JP and SJIS conversion improvement