Re: Add ENCODING option to COPY

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add ENCODING option to COPY
Date: 2011-01-25 15:24:26
Message-ID: AANLkTi=eAtrf06WLCRTyM=KZsL41R=UoVT4QDECc7G+V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/1/25 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
>> The patch overrides client_encoding by the added ENCODING option, and
>> restores it as soon as copy is done.
>
> We cannot do that because error messages should be encoded in the original
> encoding even during COPY commands with encoding option. Error messages
> could contain non-ASCII characters if lc_messages is set.

Agreed.

>> I see some complaints ask to use
>> pg_do_encoding_conversion() instead of
>> pg_client_to_server/server_to_client(), but the former will surely add
>> slight overhead per reading line
>
> If we want to reduce the overhead, we should cache the conversion procedure
> in CopyState. How about adding something like "FmgrInfo file_to_server_covv"
> into it?

I looked down to the code and found that we cannot pass FmgrInfo * to
any functions defined in pg_wchar.h, since the header file is shared
in libpq, too.

For the record, I also tried pg_do_encoding_conversion() instead of
pg_client_to_server/server_to_client(), and the simple benchmark shows
it is too slow.

with 3000000 lines with 3 columns (~22MB tsv) COPY FROM

*utf8 -> utf8 (no conversion)
13428.233ms
13322.832ms
15661.093ms

*euc_jp -> utf8 (client_encoding)
17527.470ms
16457.452ms
16522.337ms

*euc_jp -> utf8 (pg_do_encoding_conversion)
20550.983ms
21425.313ms
20774.323ms

I'll check the code more if we have better alternatives.

Regards,

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2011-01-25 15:27:29 Re: Extensions support for pg_dump, patch v27
Previous Message Dimitri Fontaine 2011-01-25 15:23:41 Re: Extensions support for pg_dump, patch v27