Re: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1

From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Ford <jeremford(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1
Date: 2009-06-19 15:33:51
Message-ID: 37ed240d0906190833u19274495nae74a841fef19bd5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

2009/6/19 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> regression=# select to_date(' 2009 07', ' YYYYMM');
>  to_date
> ------------
>  2009-07-01
> (1 row)
>
> However, that just begs the question --- it seems that leading space is
> allowed in MM, just not in YYYY.  Brendan, is that intentional or is it
> a bug?
>

The reason the space between YYYY and MM is ignored isn't to do with
any special treatment of MM, rather it's to do with special treatment
of the end-of-string. Observe:

postgres=# select to_date('200906 19', 'YYYYMMDD');
to_date
------------
2009-06-19
(1 row)

What's going on here is that from_char_parse_int_len() has two
different approaches to capturing characters from the input. The
normal mode is to pull an exact number of characters out of the
string, as per the format node; for DD we pull 2 characters, for YYYY
we pull 4, and so on. However, if the FM (fill mode) flag is given,
or if the next node is known to be a non-digit character, we take a
much more tolerant approach and let strtol() grab as many characters
as it cares to. [1]

The reason for this technique is that it allows us to get away with
things like this:

postgres=# select to_date('2-6-2009', 'DD-MM-YYYY');
to_date
------------
2009-06-02
(1 row)

Or, to put it another way, the presence of separator characters trumps
a strict character-by-character interpretation of the format string.

The code treats the end-of-string as such a separator, so in your '
MM' example, the code lets strtol() off its leash and all the
remaining characters are fed into the month field.

This special treatment of separators was actually in the code long
before I got my hands on it, and I wasn't keen to change it -- I
feared that flipping that little quirk on its head would cause even
more breakage.

I hope that answers your question. to_date() is by nature a weird
beast with many strange corners in its behaviour, and it's hard to
strike a balance between backwards compatibility and Least
Astonishment. My personal preference would be for a 100% strict
interpretation of the format pattern, and a pox on anyone who has been
relying on sloppy patterns! But that's not very practical. I would
welcome any suggestions for further refinements.

Cheers,
BJ

[1] src/backend/utils/adt/formatting.c:1846

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-06-19 16:03:13 Re: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1
Previous Message Tom Lane 2009-06-19 13:57:05 Re: BUG #4865: replace function returns null

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-06-19 15:40:16 Re: rc1 tarball contains partially outdated/missing man pages
Previous Message Peter Eisentraut 2009-06-19 15:29:17 Re: rc1 tarball contains partially outdated/missing man pages