Skip site navigation (1) Skip section navigation (2)

Re: Server-side support of all encodings

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: dezso(dot)zoltan(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Server-side support of all encodings
Date: 2007-03-30 05:38:50
Message-ID: 20070330.143850.42801214.t-ishii@sraoss.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackers
> Hello Everyone,
> 
> I very much understand why SJIS is not a server encoding. It contains
> ASCII second bytes (including \ and ' both of which can be really
> nasty inside a normal sql) and further, half-width katakana is
> represented as one byte-characters, incidentally two of which coincide
> with a kanji.
> 
> My question is, however: what would be the best practice if it was
> imperative to use SJIS encoding for texts and no built-in conversions
> are useful? To elaborate, I need to support japanese emoji characters,
> which are special emoticons for mobile phones. These characters are
> usually in a region that is not specified by the standard SJIS,
> therefore they are not properly converted either to EUC or UTF8 (which
> would be my prefered choice, but unfortunately not all mobile phones
> support it, so conversion is still necessary - from what i've seen,
> the new SJIS_2004 map seems to define these entities, but I'm not 100%
> sure they all get converted properly).
> 
> I inherited a system in which this problem is "bypassed" by setting
> SQL_ASCII server encoding, but that is not the best solution (full
> text search is rendered useless and occasionally the special character
> issue rears its ugly head - not only do we have to deal with normal
> sqlinjection, but also encoding-based injections) (and for the real
> WTF, my predecessor converted everything to EUC before inserting -
> eventually losing all the emojis and creating all sorts of strange
> phenomena, like tables with one column in euc until a certain date and
> sjis from then on while euc for all other columns)
> 
> Is there a way to properly deal with sjis+emoji extensions (a patch
> i'm not aware of, for example), is it considered as a todo for further
> releases or should i consider augmenting postgres in a way (if the
> latter, could you provide any pointers on how to proceed?)

You can always use CREATE CONVERSION for this kind of purpose.
Create your own conversion map between SJIS <--> EUC or UT-8.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

pgsql-hackers by date

Next:From: Pavan DeolaseeDate: 2007-03-30 06:14:38
Subject: Re: CREATE INDEX and HOT - revised design
Previous:From: Tatsuo IshiiDate: 2007-03-30 05:36:10
Subject: Re: Server-side support of all encodings

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group