Re: to_date_valid()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "'Andreas Karlsson *EXTERN*'" <andreas(at)proxel(dot)se>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: to_date_valid()
Date: 2016-09-27 21:01:03
Message-ID: 2284.1475010063@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> I think using ValidateDate() was the right idea. That is what we use
> for checking date validity everywhere else.

Note that we've got two different CF entries, and threads, covering
fundamentally the same territory here, ie making to_timestamp et al
behave more sanely. The other thread is
https://commitfest.postgresql.org/10/713/
https://www.postgresql.org/message-id/flat/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com

Robert pointed out several times in the other thread that to_timestamp
and friends were intended to be Oracle-compatible, which I agree with.
It was also pointed out that Oracle does range validity checks on the
timestamp component fields, which I verified just now using
http://rextester.com/l/oracle_online_compiler
(which is what I found by googling after discovering that sqlfiddle
isn't working very well, sigh). I got these results:

select banner as "oracle version" from v$version
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01850: hour must be between 0 and 23

SELECT TO_TIMESTAMP('2016-06-13 19:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01851: minutes must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01852: seconds must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
13.06.2016 19:59:59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH:MI:SS') FROM dual
ORA-01849: hour must be between 1 and 12

SELECT TO_TIMESTAMP('2016-02-30 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

SELECT TO_TIMESTAMP('2016-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
29.02.2016 19:59:59

SELECT TO_TIMESTAMP('2015-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

I think this is sufficient evidence to show that we ought to change
to_timestamp et al. to apply range checking on the component fields.
And while I've not attempted to characterize exactly what Oracle
does with extra spaces, non-matching punctuation, etc, it's also
clear to me that their behavior is different and probably saner
than ours.

So I vote for fixing these functions to behave more like Oracle, and
forgetting about having a separate family of to_date_valid() functions
or optional parameters or anything of the sort. I might've been in
favor of that, until I saw how far down the rabbit hole this thread
had gotten. Let's just call it a bug and fix it.

Having range checking after the field scanning phase also changes the
terms of discussion about what we need to do in the scanning phase, since
it would catch many (of course not all) of the problems that arise from
field boundary issues. So I think we should get the range changing
committed first and then fool with the scanner.

The 0002 patch that Artur sent for this purpose in
https://www.postgresql.org/message-id/22dbe4e0-b550-ca86-8634-adcda0faa159@postgrespro.ru
seems like the right approach to me, though I've not read it in detail
yet. I propose to work through that and commit it.

I'm much less sold on his 0001 patch, but I tend to agree that what
we want is to adjust the scanner behavior. I do not like the
reverse-conversion wrapper that Andreas proposes here; quite aside from
micro-efficiency issues, it seems to me that that just begs the question
of how you know what the input is supposed to mean.

In short, I think we should reject this thread + CF entry altogether
and instead move forward with the approach in the other thread.
It's probably too late in the September CF to close out the other
submission entirely, but we could get the range checking committed
and then RWF the other part for reconsideration later.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-27 21:12:32 Re: Detect supported SET parameters when pg_restore is run
Previous Message Vitaly Burovoy 2016-09-27 20:48:25 Re: Detect supported SET parameters when pg_restore is run