Re: pg should ignore u+200b zero width space

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: James Cloos <cloos(at)jhcloos(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: pg should ignore u+200b zero width space
Date: 2020-11-03 14:52:47
Message-ID: 919181.1604415167@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 03/11/2020 15:41, James Cloos wrote:
>> pg should treat a no break space after whitespace as just more
>> whitespace.

> Hmm. I'm not sure if change the behavior is a good idea, but a hint in
> the error message would be nice. Something like:

The difficulty with doing anything in this space --- whether it be
ignoring, throwing an error, or whatever --- is that it makes the
lexer's behavior encoding-sensitive and potentially locale-sensitive.
That's problematic for all sorts of reasons. One of the worst is
that frontend programs such as psql and ecpg also have SQL lexers,
and there'd be no way to keep their behavior in precise sync with
the backend's. (They might not even be running in the same encoding,
never mind locale.) It might even be possible to build security
holes around that; recall the fun we've had with trying to lock
down quoting rules in encodings where backslash can be part of a
multibyte character :-(.

Perhaps it'd be all right to confine the change in behavior to
just modifying the error text in cases where we were going to
throw an error anyway. But I think this is much harder than
it sounds to do in a valid, safe way.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2020-11-03 15:13:41 Re: pg should ignore u+200b zero width space
Previous Message Tom Lane 2020-11-03 14:36:09 Re: BUG #16698: Create extension and search path