From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding |
Date: | 2013-06-11 03:15:09 |
Message-ID: | 20130611031509.GB567452@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Sun, Jun 09, 2013 at 11:39:18AM -0400, Tom Lane wrote:
> The key point for me is that if tolower() actually does anything in the
> previous state of the code, it's more than likely going to produce
> invalidly encoded data. The consequences of that can't be good.
> You can argue that there might be people out there for whom the
> transformation accidentally produced a validly-encoded string, but how
> likely is that really? It seems much more likely that the only reason
> we've not had more complaints is that on most popular platforms, the
> code accidentally fails to fire on any UTF8 characters (or any common
> ones, anyway). On those platforms, there will be no change of behavior.
Your hypothesis is that almost all libc tolower() implementations will in
every case either (a) turn a multi-byte character to byte soup not valid in
the server encoding or (b) leave it unchanged? Quite possible. If that
hypothesis holds, I agree that the committed change does not break
compatibility. That carries a certain appeal.
I still anticipate regretting that we have approved and made reliable this
often-sufficed-by-accident behavior, particularly when the SQL standard calls
for something else. But I think I now understand your reasoning.
> The resistance to moving this code to use towlower() for non-ASCII
> mainly comes from worries about speed, I think; although there was also
> something about downcasing conversions that change the string's byte
> length being problematic for some callers.
Considering that using ASCII-only or quoted identifiers sidesteps the speed
penalty altogether, that seems a poor cause for demur.
Thanks,
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2013-06-11 05:31:16 | pgsql: Add description that loread()/lowrite() are corresponding to |
Previous Message | Fujii Masao | 2013-06-10 18:04:11 | pgsql: Fix pg_isready to handle conninfo properly. |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2013-06-11 03:22:08 | Re: JSON and unicode surrogate pairs |
Previous Message | Josh Berkus | 2013-06-11 01:06:49 | Re: Hard limit on WAL space used (because PANIC sucks) |