Re: CopyReadLineText optimization revisited

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CopyReadLineText optimization revisited
Date: 2010-08-26 19:16:06
Message-ID: 26738.1282850166@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> * we perform possible encoding conversion early, one input block at a
> time, rather than after splitting the input into lines. This allows us
> to assume in the later stages that the data is in server encoding,
> allowing us to search for the '\n' byte without worrying about
> multi-byte characters.

Seems reasonable, although the need to deal with multibyte characters
crossing a block boundary injects some ugliness that wasn't there before.

> * instead of the byte-at-a-time loop in CopyReadLineText(), use memchr()
> to find the next NL/CR character. This is where the speedup comes from.

That seems like the speedup, if any, would be extremely
platform-dependent. What have you tested on?

> There's a small fly in the ointment: the patch won't recognize backslash
> followed by a linefeed as an escaped linefeed. I think we should simply
> drop support for that.

I think this is likely to break apps that have worked for years. I
can't get excited about doing that in return for an "0-10%" speedup
that might only materialize on some platforms. If the numbers were
better, it'd be worth paying that price, but ...

> It's not strictly necessary, but how about dropping support for the old
> COPY protocol, and the EOF marker \. while we're at it? It would allow
> us to drop some code, making the remaining code simpler, and reduce the
> testing effort. Thoughts on that?

Again, I think the threshold requirement for breaking compatibility
needs to be a lot higher than this.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-08-26 19:22:22 Re: bg worker: patch 1 of 6 - permanent process
Previous Message Alvaro Herrera 2010-08-26 19:07:57 Re: Unable to drop role