Back in March, I played around with various hacks to improve COPY FROM
I got busy with other stuff, but I now got around to try what I planned
back then. I don't know if I have the time to finish this for 8.4, but
might as well post what I've got.
The basic idea is to replace the custom loop in CopyReadLineText with
memchr(), because memchr() is very fast. To make that possible, perform
the client-server encoding conversion on each raw block that we read in,
before splitting it into lines. That way CopyReadLineText only needs to
deal with server encodings, which is required for the memchr() to be safe.
Attached is a quick patch for that. Think of it as a prototype; I
haven't tested it much, and I feel that it needs some further cleanup.
Quick testing suggests that it gives 0-20% speedup, depending on the
data. Very narrow tables don't benefit much, but the wider the table,
the bigger the gain. I haven't found a case where it performs worse.
I'm only using memchr() with non-csv format at the moment. It could be
used for CSV as well, but it's more complicated because in CSV mode we
need to keep track of the escapes.
Some collateral damage:
\. no longer works. If we only accept \. on a new line, like we already
do in CSV mode, it shouldn't be hard or expensive to make it work again.
The manual already suggests that we only accept it on a single line:
"End of data can be represented by a single line containing just
Escaping a linefeed or carriage return by prepending it with a backslash
no longer works. You have to use \n and \r. The manual already warns
against doing that, so I think we could easily drop support for it. If
we wanted, it wouldn't be very hard to make it work, though: after
hitting a newline, scan back and count how many backslashes there is
before the newline. An odd number means that it's an escaped newline,
even number (including 0) means it's a real newline.
For the sake of simplifying the code, would anyone care if we removed
support for COPY with protocol version 2?
pgsql-hackers by date
|Next:||From: Alvaro Herrera||Date: 2008-10-30 13:18:06|
|Subject: Re: Block-level CRC checks|
|Previous:||From: Simon Riggs||Date: 2008-10-30 13:04:02|
|Subject: Re: Hot Standby: Caches and Locks|