CopyReadLineText optimization

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: <pgsql-patches(at)postgresql(dot)org>
Subject: CopyReadLineText optimization
Date: 2008-02-24 01:29:47
Message-ID: 47C0C88B.8090904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

The purpose of CopyReadLineText is to scan the input buffer, and find
the next newline, taking into account any escape characters. It
currently operates in a loop, one byte at a time, searching for LF, CR,
or a backslash. That's a bit slow: I've been running oprofile on COPY,
and I've seen CopyReadLine to take around ~10% of the CPU time, and
Joshua Drake just posted a very similar profile to hackers.

Attached is a patch that modifies CopyReadLineText so that it uses
memchr to speed up the scan. The nice thing about memchr is that we can
take advantage of any clever optimizations that might be in libc or
compiler.

In the tests I've been running, it roughly halves the time spent in
CopyReadLine (including the new memchr calls), thus reducing the total
CPU overhead by ~5%. I'm planning to run more tests with data that has
backslashes and with different width tables to see what the worst-case
and best-case performance is like. Also, it doesn't work for CSV format
at the moment; that needs to be fixed.

5% isn't exactly breathtaking, but it's a start. I tried the same trick
to CopyReadAttributesText, but unfortunately it doesn't seem to help
there because you need to "stop" the efficient word-at-a-time scan that
memchr does (at least with glibc, YMMV) whenever there's a column
separator, while in CopyReadLineText you get to process the whole line
in one call, assuming there's no backslashes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
copy-readline-memchr-2.patch text/x-diff 4.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-02-24 01:45:51 Re: 8.3 / 8.2.6 restore comparison
Previous Message Heikki Linnakangas 2008-02-24 00:43:18 Re: 8.3 / 8.2.6 restore comparison

Browse pgsql-patches by date

  From Date Subject
Next Message Luke Lonergan 2008-02-24 01:46:40 Re: CopyReadLineText optimization
Previous Message Mathias Hasselmann 2008-02-23 21:19:17 Re: Avahi support for Postgresql