Undocumented feature costs a lot of performance in COPY IN

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 19:49:05
Message-ID: 2841.1007495345@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I have been fooling around profiling various ways of inserting wide
(8000-byte, not all that wide) bytea fields, per Brent Verner's note
of a few days ago. COPY IN should be, and is, the fastest way to
do it. But I was rather startled to discover that 25% of the runtime
of COPY IN went to an inefficient way of fetching single bytes from
pqcomm.c (pq_getbytes(&ch, 1) instead of ch = pq_getbyte()), and
20% of what's left after fixing that is going into the strchr() call
in CopyReadAttribute.

Now the point of that strchr() call is to detect whether the current
character is the column delimiter. The COPY reference page clearly
says:

By default, a text copy uses a tab ("\t") character as a
delimiter between fields. The field delimiter may be changed to
any other single character with the keyword phrase USING
DELIMITERS. Characters in data fields which happen to match the
delimiter character will be backslash quoted. Note that the
delimiter is always a single character. If multiple characters
are specified in the delimiter string, only the first character
is used.

and indeed, only the first character is used by COPY OUT. But COPY IN
is presently coded so that if multiple characters are mentioned in
USING DELIMITERS, any one of them will be taken as a field delimiter.

I would like to change the code to just "if (c == delim[0])",
which should buy back most of that 20% and make the behavior match the
documentation. Question for the list: is this a bad change? Is anyone
out there actually using this undocumented behavior?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-12-04 19:52:19 Re: Problem (bug?) with like
Previous Message Laszlo Hornyak 2001-12-04 19:18:52 Re: java stored procedures

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2001-12-04 20:07:01 Re: Undocumented feature costs a lot of performance in COPY
Previous Message Hannu Krosing 2001-12-04 18:31:13 Re: Undocumented feature costs a lot of performance in