| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> | 
|---|---|
| To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> | 
| Cc: | pgsql-patches(at)postgresql(dot)org | 
| Subject: | Re: CopyReadLineText optimization | 
| Date: | 2008-03-06 18:45:34 | 
| Message-ID: | 47D03BCE.9030909@dunslane.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches | 
Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
>> Heikki Linnakangas wrote:
>>> Attached is a patch that modifies CopyReadLineText so that it uses 
>>> memchr to speed up the scan. The nice thing about memchr is that we 
>>> can take advantage of any clever optimizations that might be in libc 
>>> or compiler.
>>
>> Here's an updated version of the patch. The principle is the same, 
>> but the same optimization is now used for CSV input as well, and 
>> there's more comments.
>
> Another update attached: It occurred to me that the memchr approach is 
> only safe for server encodings, where the non-first bytes of a 
> multi-byte character always have the hi-bit set.
>
We currently make the following assumption in the code:
     * These four characters, and the CSV escape and quote characters, are
     * assumed the same in frontend and backend encodings.
     *
The four characters are the carriage return, line feed, backslash and dot.
I think the requirement might well actually be somewhat stronger than 
that: i.e. that none of these will appear as a non-first byte in any 
multi-byte client encoding character. If that's right, then we should be 
able to write CopyReadLineText without bothering about multi-byte chars. 
If it's not right then I suspect we have some cases that can fail now 
anyway. (I believe some client encodings at least use backslash in 
subsequent chars, and that's a nasty one because the "\." end sequence 
is hard coded). I believe all the chars up to 0x2f are safe - that 
includes both quote chars and dot)
cheers
andrew
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2008-03-06 18:52:11 | Re: CopyReadLineText optimization | 
| Previous Message | Richard Huxton | 2008-03-06 18:10:30 | Re: Behaviour of to_tsquery(stopwords only) | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2008-03-06 18:51:11 | Re: DTrace probe patch for OS X Leopard | 
| Previous Message | Alvaro Herrera | 2008-03-06 12:10:57 | Re: NetBSD/MIPS supports dlopen |