Re: Allow to_date() and to_timestamp() to accept localized names

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, Arthur Zakirov <zaartur(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow to_date() and to_timestamp() to accept localized names
Date: 2020-01-24 16:42:31
Message-ID: 2f83150e-a2c0-6318-3125-d9b86e421aa2@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-01-24 17:22, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> But that's a different POV. The input to this function could come from
>> arbitrary user input from any application whatsoever. So the only
>> reason we can get away with that is because the example regression case
>> Juan José added (which uses non-normals) does not conform to the
>> standard.
>
> I'm unsure about "conforming to standard", but I think it's reasonable
> to put the onus of doing normalization when necessary on the user.
> Otherwise, we need to move normalization logic into basically all
> the string processing functions (even texteq), which seems like a
> pretty huge cost that will benefit only a small minority of people.
> (If it's not a small minority, then where's the bug reports complaining
> that we don't do it today?)

These reports do exist, and this behavior is known. However, the impact
is mostly that results "look wrong" (looks the same but doesn't compare
as equal) rather than causing inconsistency and corruption, so it's
mostly shrugged off. The nondeterministic collation feature was
introduced in part to be able to deal with this; the pending
normalization patch is another. However, this behavior is baked deeply
into Unicode, so no single feature or facility will simply make it go away.

AFAICT, we haven't so far had any code that does a lookup of non-ASCII
strings in a table, so that's why we haven't had this discussion yet.

Now that I think about it, you could also make an argument that this
should be handled through collation, so the function that looks up the
string in the locale table should go through texteq. However, this
would mostly satisfy the purists but create a bizarre user experience.

Looking through the patch quickly, if you want to get Unicode-fancy,
doing a case-insensitive comparison by running lower-case on both
strings is also wrong in corner cases. All the Greek month names end in
sigma, so I suspect that this patch might not work correctly in such cases.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-24 16:46:48 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message David Steele 2020-01-24 16:36:58 Re: making the backend's json parser work in frontend code