Re: client encoding name normalization in psycopg 2.4

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Federico Di Gregorio <federico(dot)digregorio(at)dndg(dot)it>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: client encoding name normalization in psycopg 2.4
Date: 2011-04-08 12:59:35
Message-ID: BANLkTi=2k34-Sn87if3SmEeGi+_Uqui9=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Fri, Apr 8, 2011 at 12:34 PM, Federico Di Gregorio
<federico(dot)digregorio(at)dndg(dot)it> wrote:
> On 07/04/11 21:46, Peter Eisentraut wrote:
> [snip]
>> Attached is a patch that implements that.  Note that the PostgreSQL
>> backend version of this actually lowercases the encoding names during
>> normalization.  I have made this patch uppercase them to keep the patch
>> smaller, but you may want to consider doing the lowercasing, to keep
>> things consistent.
>
> The patch seems fine to me. I'll check it in later during the we.

I was working on the patch, but there's something not straightforward.

I think assuming that psycopg2.extensions.encodings[conn.encoding]
will always work is reasonable (also because it's the only way to
convert the PG encoding to a Python encoding). The patch breaks this
assumption, without which getting the Python codec name from the PG
encoding becomes a convoluted operation.

A better fix is probably to set connection.encoding to the normalized
string, so that the lookup will always work. This means that
connection.encoding is no more exactly what returned by SHOW
connection_encoding but I don't think this is really important
(furthermore the current code already converts it in uppercase).

Note that fixing the encodings mapping to set the normalized names as
key is not required: the mapping is extended with the normalized names
when the psycopg2.extensions module is imported. Albeit the non-normal
names would no more strictly required, I'd prefer to leave them as
these variants are the one published in the PG documentation
(http://www.postgresql.org/docs/9.0/static/multibyte.html) and I think
it's a good thing to have them mapped to Python.

I've implemented the above in a fix-encoding branch in my github
repos. If it's fine I'll merge to my devel (now the unit test passes
entirely setting a non-normal PGCLIENTENCODING).

-- Daniele

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Federico Di Gregorio 2011-04-08 13:14:00 Re: client encoding name normalization in psycopg 2.4
Previous Message Federico Di Gregorio 2011-04-08 11:34:19 Re: client encoding name normalization in psycopg 2.4