From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at> |
Subject: | Re: Bug in UTF8-Validation Code? |
Date: | 2007-04-04 15:57:57 |
Message-ID: | 200704041757.58574.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Am Mittwoch, 4. April 2007 16:22 schrieb Tom Lane:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Right -- IMHO what we should be doing is reject any input to chr() which
> > is beyond plain ASCII (or maybe > 255), and create a separate function
> > (unicode_char() sounds good) to get an Unicode character from a code
> > point, converted to the local client_encoding per conversion_procs.
>
> Hm, I hadn't thought of that approach, but another idea is that the
> argument of chr() is *always* a unicode code point, and it converts
> to the current encoding. Do we really need a separate function?
The SQL standard has a "Unicode character string literal", which looks like
this:
U&'The price is 100 \20AC.'
This is similar in spirit to our current escape mechanism available via E'...'
which, however, produces bytes. It has the advantage over a chr()-based
mechanism that the composition of strings doesn't require an ugly chain of
literals, functions, and concatenations.
Implementing this would, however, be a bit tricky because you don't have
access to the encoding conversion functions in the lexer. You would probably
have to map that to a function call an evaluate it later.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-04-04 16:12:02 | Re: IDENTITY/GENERATED v36 Re: Final version of IDENTITY/GENERATED patch |
Previous Message | Mark Dilger | 2007-04-04 15:56:50 | Re: Bug in UTF8-Validation Code? |