Re: [HACKERS] Date conversion using day of week

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, marc(at)bloodnok(dot)com, hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Date conversion using day of week
Date: 2011-03-31 17:16:26
Message-ID: 4D94B6EA.9070102@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 03/31/2011 08:00 AM, Adrian Klaver wrote:
> On Wednesday, March 30, 2011 8:39:25 pm Brendan Jurd wrote:
>> On 31 March 2011 03:15, Steve Crawford<scrawford(at)pinpointresearch(dot)com> wrote:
>>> On 03/29/2011 04:24 PM, Adrian Klaver wrote:
>>>> ...
>>>> Well the strange part is only fails for SUN:...
>>>> test(5432)aklaver=>select to_date('2011-13-SUN', 'IYYY-IW-DY');
>>>> to_date
>>>> ------------
>>>> 2011-03-28
>>>> ...
>>> You specified Sunday as the day but the date returned is a Monday. I
>>> would categorize that as a bug. (Hackers cc'd). Since Sunday is the last
>>> day of an ISO week, it should have returned 2011-04-03.
>>>
>>> My first inclination without consulting source or morning coffee is that
>>> PostgreSQL is seeing Sunday as day zero. Note that while:
>> The relevant paragraphs in the docs are:
>>
>> --
>> An ISO week date (as distinct from a Gregorian date) can be specified
>> to to_timestamp and to_date in one of two ways:
>>
>> * Year, week, and weekday: for example to_date('2006-42-4',
>> 'IYYY-IW-ID') returns the date 2006-10-19. If you omit the weekday it
>> is assumed to be 1 (Monday).
>> * Year and day of year: for example to_date('2006-291',
>> 'IYYY-IDDD') also returns 2006-10-19.
>>
>> Attempting to construct a date using a mixture of ISO week and
>> Gregorian date fields is nonsensical, and will cause an error. In the
>> context of an ISO year, the concept of a "month" or "day of month" has
>> no meaning. In the context of a Gregorian year, the ISO week has no
>> meaning. Users should avoid mixing Gregorian and ISO date
>> specifications.
>> --
>>
>> We *could* make the OP's query return the Sunday of ISO week 2011-13,
>> which would be properly written 2011-13-7, but I think the right move
>> here would be to throw the error for illegal mixture of format tokens.
>> This is a trivial change -- just a matter of changing the from_date
>> type on the DAY, Day, day, DY, Dy, dy keys.
>>
>> With the attached patch applied, this is what happens instead:
>>
>> # select to_date('2011-13-SUN', 'IYYY-IW-DY');
>> ERROR: invalid combination of date conventions
>> HINT: Do not mix Gregorian and ISO week date conventions in a
>> formatting template.
>>
>> If we wanted to make it "work", then I think the thing to do would be
>> to add a new set of formatting tokens IDY, IDAY etc. I don't like the
>> idea of interpreting DY and co. differently depending on whether the
>> other tokens happen to be ISO week or Gregorian.
> Just to play Devils advocate here, but why not? The day name is the same either
> way, it is the index that changes. I am not sure why that could not be context
> specific?
>

A week day represented as an int is ambiguous - as you mention, the
index is necessary to decode to the correct day. "Sunday" is unambiguous
so we could do something reasonable. But from everything I've read
(though I didn't actually shell out 130CHF for a full 33-page copy of
ISO8601:2004), the ISO *week* date format does not represent day-of-week
as other than a numeric value so it would not really be an ISO8601
formatted date and I would be tempted to thrown an error. However...

This whole discussion opens a #10 sized can o' worms. Admittedly, I
don't have good knowledge of any SQL-mandated interpretations of an ISO
date - but based on my reading of ISO formatting I see the following issues:

1. What we describe in the documentation as an ISO date is actually an
ISO *week* date - a special purpose format included within ISO8601.
2011-03-31 is also an ISO date as are 20110331, 20110331T013212 and
20110331T21.3344298. "Fixing" this is probably as simple as a
clarification in the documentation.

2. The ISO week-date format is defined as having the week-number
prefaced by a "W" as in 2011-W03-7. From the ISO8601 FAQ page: "Week
date is an alternative date representation used in many commercial and
industrial applications. It is: YYYY-Www-D where YYYY is the Year in the
Gregorian calendar, ww is the week of the year between 01 (the first
week) and 52 or 53 (the last week), and D is the day in the week between
1 (Monday) and 7 (Sunday). Example: 2003-W14-2 represents the second day
of the fourteenth week of 2003." However PostgreSQL does *not* accept
that as input even as specified as an "ISO" date:

select to_date('2003-W14-2', 'IYYY-IW-ID');
ERROR: invalid value "W1" for "IW"
DETAIL: Value must be an integer.

Fixing this would require both a coding change and a decision whether or
not to throw an error on incorrectly formatted input.

3. ISO8601 requires zero-padding. PostgreSQL, however, does not complain
if that padding is missing. The following should be "2011-04-2"
(actually, "2011-W04-2" as noted above) but PostgreSQL accepts:

select to_date('2011-4-2', 'IYYY-IW-ID');
to_date
------------
2011-01-25

However in ISO dates the hyphens are supposed to only be for easier
reading by humans. But if we just remove them:

select to_date('201142', 'IYYYIWID');
to_date
------------
2011-10-17

(Monday of the 42nd week).

Fix it and throw an error (and suffer the howls of anguish when backward
compatibility is shattered) or tiptoe quietly away?

Of course to make things more complicated, the above also depends to
some extent on whether you are looking at ISO 2014, ISO 2015, ISO 2711,
ISO 3307, ISO 4031, ISO8601:1999, ISO8601:2000 or, finally, ISO8601:2004
which supersedes all of the prior.

What I've concluded is that the root of the entire problem is providing
ISO formatting options in pieces at all. The ISO date format has various
requirements like ordering from largest temporal term to smallest,
zero-padding, ""W" prefacing an ISO week, no "skipping" of temporal
terms (201105 is May 2011, never the 5th of an unknown month), etc. all
intended to make an ISO date string unambiguous. As such, it should only
require a single format option saying "this is an ISO8601 date string"
and mixing of ISO and Gregorian date formatting becomes impossible.

Cheers,
Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlo Stonebanks 2011-03-31 17:18:54 Sequence names have 64 character limit?
Previous Message Brendan Jurd 2011-03-31 16:58:48 Re: [HACKERS] Date conversion using day of week

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-03-31 17:31:52 Re: Bug in autovacuum.c?
Previous Message Brendan Jurd 2011-03-31 16:58:48 Re: [HACKERS] Date conversion using day of week