Re: COPY ENCODING revisited

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY ENCODING revisited
Date: 2011-02-17 18:57:29
Message-ID: AANLkTik7AYu7Zz8yQ4vk5LArqdi1gR4rd=QLOU3Tt5q0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 16, 2011 at 10:45 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> COPY ENCODING patch was returned with feedback,
>  https://commitfest.postgresql.org/action/patch_view?id=501
> but we still need it for file_fdw.  Using client_encoding at runtime
> is reasonable for one-time COPY command, but logically nonsense for
> persistent file_fdw tables.
>
> Base on the latest patch,
>  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02903.php
> I added pg_any_to_server() and pg_server_to_any() functions instead of
> exposing FmgrInfo in pg_wchar.h.  They are same as pg_client_to_server()
> and pg_server_to_client(), but accept any encoding. They use cached
> conversion procs only if the specified encoding matches the client encoding.
>
> According to Harada's research,
>  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02397.php
> non-cached conversions are slower than cached ones. This version provides
> the same performance before when file and client encoding are same,
> but would be a bit slower on other cases. We could improve the performance
> in future versions, for example, caching each used conversion proc in
> pg_do_pg_do_encoding_conversion().
>
> file_fdw will support ENCODING option. Also, if not specified it might
> have to store the client_encoding at CREATE FOREIGN TABLE. Even if we use
> a different client_encoding at SELECT, the encoding at definition is used.
>
> ENCODING 'quoted name' issue is also fixed; it always requires quoted names.
> I think we only accept non-quoted text as identifier names. Unquoted text
> should be treated as "double quoted", but encoding names are not identifiers.

I am not qualified to fully review this patch because I'm not all that
familiar with the encoding stuff, but it looks reasonably sensible on
a quick read-through. I am supportive of making a change in this area
even at this late date, because it seems to me that if we're not going
to change this then we're pretty much giving up on having a usable
file_fdw in 9.1. And since postgresql_fdw isn't in very good shape
either, that would mean we may as well give up on SQL/MED. We might
have to do that anyway, but I don't think we should do it just because
of this issue, if there's a reasonable fix.

I don't think the fact that the performance bites is a reason not to
do this. As you say, that can always be improved in the future.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-17 18:58:14 Re: contrib loose ends: 9.0 to 9.1 incompatibilities
Previous Message Tom Lane 2011-02-17 18:53:04 Re: contrib loose ends: 9.0 to 9.1 incompatibilities