Quick Links

Re: Allow to_date() and to_timestamp() to accept localized names

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Arthur Zakirov <zaartur(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Allow to_date() and to_timestamp() to accept localized names
Date:	2020-01-24 15:20:30
Message-ID:	28023.1579879230@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo(dot)santamaria(at)gmail(dot)com> writes:
> On Thu, Jan 23, 2020 at 11:00 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * It's not exactly apparent to me why this code should be concerned
>> about non-normalized characters when noplace else in the backend is.

> There is an open patch that will make the normalization functionality user
> visible [1]. So, if a user can call to_date(normalize('01 ŞUB 2010'), 'DD
> TMMON YYYY') I would vote to drop the normalization logic inside this patch
> altogether.

Works for me.

> * I have no faith in this calculation that decides how long the match
>> length was:
>> *len = element_len + name_len - norm_len;

> The proper logic would come from do_to_timestamp() receiving a normalized
> "date_txt" input, so we do not operate with unnormalize and normalize
> strings at the same time.

No, that only solves half the problem, because the downcasing
transformation can change the string length too. Two easy examples:

* In German, I believe "ß" downcases to "ss". In Latin-1 encoding that's
a length change, though I think it might accidentally not be in UTF8.

* The Turks distinguish dotted and dotless "i", so that "İ" downcases to
"i", and conversely "I" downcases to "ı". Those are length changes in
UTF8, though not in whichever Latin-N encoding works for Turkish.

Even if these cases happen not to apply to any month or day name of
the relevant language, we still have a problem, arising from the fact
that we're downcasing the whole remaining string --- so length changes
after the match could occur anyway.

> I would like to rise a couple of questions myself:

> * When compiled with DEBUG_TO_FROM_CHAR, there is a warning "‘dump_node’
> defined but not used". Should we drop this function or uncomment its usage?

Maybe, but I don't think it belongs in this patch.

> * Would it be worth moving str_tolower(localized_name)
> from seq_search_localized() into cache_locale_time()?

I think it'd complicate tracking when that cache has to be invalidated
(i.e. it now depends on more than just LC_TIME). On the whole I wouldn't
bother unless someone does the measurements to show there'd be a useful
speedup.

regards, tom lane

In response to

Re: Allow to_date() and to_timestamp() to accept localized names at 2020-01-24 09:48:19 from Juan José Santamaría Flecha

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-01-24 15:24:11	Re: Busted(?) optimization in ATAddForeignKeyConstraint
Previous Message	Peter Eisentraut	2020-01-24 15:05:27	Re: making the backend's json parser work in frontend code