Re: [rfc] unicode escapes for extended strings

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sam Mason <sam(at)samason(dot)me(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [rfc] unicode escapes for extended strings
Date: 2009-04-18 12:29:05
Message-ID: e51f66da0904180529t2bf46458ga9df7909ab2aca78@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/18/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> > On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
> >> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
> >>> Btw, is there any good reason why we don't reject \000, \x00
> >>> in text strings?
> >>
> >> Why forbid nulls in text strings?
>
> > As far as I know, PG assumes, like most C code, that strings don't
> > contain embedded NUL characters.
>
>
> Yeah; we should reject them because nothing will behave very sensibly
> with them, eg
>
> regression=# select E'abc\000xyz';
> ?column?
> ----------
> abc
> (1 row)
>
> The point has come up before, and I kinda thought we *had* changed the
> lexer to reject \000. I see we haven't though. Curiously, this
> does fail:
>
> regression=# select U&'abc\0000xyz';
> ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00
> HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
>
> though that's not quite the message I'd have expected to see.

I think that's because out verifier actually *does* reject \0,
only problem is that \0 does not set saw_high_bit flag,
so the verifier simply does not get executed.
But U& executes it always.

unicode=# SELECT e'\xc3\xa4';
?column?
----------
ä
(1 row)

unicode=# SELECT e'\xc3\xa4\x00';
ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

Heh.

--
marko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-04-18 12:32:07 Re: [GENERAL] Performance of full outer join in 8.3
Previous Message Tom Lane 2009-04-18 12:16:57 Re: Patch for 8.5, transformationHook