Re: Allow to_date() and to_timestamp() to accept localized names

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Arthur Zakirov <zaartur(at)gmail(dot)com>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow to_date() and to_timestamp() to accept localized names
Date: 2020-01-23 22:00:53
Message-ID: 20326.1579816853@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a v7 patch, rebased over my recent hacking, and addressing
most of the complaints I raised in <31691(dot)1579648824(at)sss(dot)pgh(dot)pa(dot)us>.
However, I've got some new complaints now that I've looked harder
at the code:

* It's not exactly apparent to me why this code should be concerned
about non-normalized characters when noplace else in the backend is.
I think we should either rip that out entirely, or move the logic
into str_tolower (and hence also str_toupper etc). I'd tend to favor
the former, but of course I don't speak any languages where this would
be an issue. Perhaps a better idea is to invent a new SQL function
that normalizes a given string, and expect users to call that first
if they'd like to_date() to match unnormalized text.

* I have no faith in this calculation that decides how long the match
length was:

*len = element_len + name_len - norm_len;

I seriously doubt that this does the right thing if either the
normalization or the downcasing changed the byte length of the
string. I'm not actually sure how we can do that correctly.
There's no good way to know whether the changes happened within
the part of the "name" string that we matched, or the part beyond
what we matched, but we only want to count the former. So this
seems like a pretty hard problem, and even if this logic is somehow
correct as it stands, it needs a comment explaining why.

* I'm still concerned about the risk of integer overflow in the
string allocations in seq_search_localized(). Those need to be
adjusted to be more paranoid, similar to the code in e.g. str_tolower.

regards, tom lane

Attachment Content-Type Size
0001-Allow-localized-month-names-to_date-v7.patch text/x-diff 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-23 22:11:57 Busted(?) optimization in ATAddForeignKeyConstraint
Previous Message Alvaro Herrera 2020-01-23 21:47:43 Re: progress report for ANALYZE