Re: Rectifying wrong Date outputs

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 10:24:57
Message-ID: 4D872779.2070101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.03.2011 07:40, Piyush Newe wrote:
> On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>> On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
>>> <alvherre(at)commandprompt(dot)com> wrote:
>>>> Keep in mind that the datetime stuff was abandoned by the maintainer
>>>> some years ago with quite some rough edges. Some of it has been fixed,
>>>> but a lot of bugs remain. Looks like this is one of those places and it
>>>> seems appropriate to spend some time fixing it. Since it would involve
>>>> a behavior change, it should only go to 9.2, of course.
>>
>>> I wouldn't object to fixing the problem with # of digits> # of Ys in
>>> 9.1, if the fix is simple and clear-cut. I think we are still
>>> accepting patches to make minor tweaks, like the tab-completion patch
>>> I committed yesterday. It also doesn't bother me tremendously if we
>>> push it off, but I don't think that anyone's going to be too sad if
>>> TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
>>> sensible than 3010-01-01.
>>
>> Agreed, it's certainly not too late for bug fixes in 9.1. I agree
>> that this isn't something we would want to tweak in released branches,
>> but 9.1 isn't there yet.
>>
>>
> I feel the patch for the same would be easier and was attached in the
> initial mail of this mail thread. For your ready reference, I am attaching
> the same patch here again.
>
>
>> Having said that, it's not entirely clear to me what sane behavior is
>> here. Personally I would expect that an n-Ys format spec would consume
>> at most n digits from the input. Otherwise how are you going to use
>> to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

>> So
>> I think the problem is actually upstream of the behavior complained of
>> here. However, what we should first do is see what Oracle does in such
>> cases, because the main driving factor for these functions is Oracle
>> compatibility not what might seem sane in a vacuum.
>>
> Following is the extended chart which is comparing the behavior of Oracle,
> PG& EDBAS.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
>
> In this case, all the cases are in sync except the 1st one. I didn't
> understand why Oracle is interpreting year '1' as '2011'.

It makes sense to me. Year "1", when dat format is "Y", means the year
closest to current date that ends with 1. Or maybe the year that ends
with 1 in the current decade. This is analoguous to how two-digit years
are interpreted (except that we've hardcoded that the "current date" to
compare against is year 2000 - an assumption that will start to bite us
some time before year 2100).

> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error
> TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010
>
> In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
> Oracle is throwing error in 1st case since the Format ('Y') is lesser than
> the actual value ('10'). But PostgreSQL is ignoring this case and throwing
> whatever is input. The output is might not be the same was user is
> expecting.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error
> TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
> TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
> TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
>
> In this case, just last case was correct in PG. Rest other cases are not in
> sync with Oracle, rather the output is vague.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error
> TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
>
> In this case, PG is giving wrong output in first 3 cases. Those need to get
> rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
> the format is lesser than the actual value. It seems this rule is not
> applicable for 2nd case in Oracle.

Yeah, quite inconsistent :-(.

These results are not in favor of the idea that a format with n Ys
always consumess up to n digits from the input. With that rule,
to_date('01-jan-2010', 'DD-MON-YY') would return "01-JAN-2020", which
isn't what Oracle does and seems quite surprising too.

So ignoring the cases where Oracle throws an error but PostgreSQL
doesn't, there's four cases where the results differ:

> *Data Format Oracle PostgreSQL EDBAS*
> TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
> TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
> TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

IMHO our current behavior in 2nd and 4th case is so bizarre that we
should change them to match Oracle. I think we should fix the 1st too,
the notion that a single-digit year means something between 2000-2009
seems pretty useless (granted, using a single digit for year is
brain-dead to begin with).

The 3rd one is debatable. The range for three-digit years is currently
1100-2099, which is enough range for many applications. But should we
change it for the sake of matching Oracle's behavior? Not that anyone
uses YYY in practice, but still.

BTW, whatever behavior we choose, this needs to be documented. I don't
see anything in the docs on how Y, YY or YYY are expanded.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2011-03-21 11:17:04 Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Previous Message Greg Stark 2011-03-21 10:24:22 Re: 2nd Level Buffer Cache