Re: [HACKERS] Bug in to_timestamp().

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Bug in to_timestamp().
Date: 2018-08-02 15:17:01
Message-ID: CAPpHfdtqOSniGJRvJ2zaaE8=eMB8XDnzvVS-9c3Xufaw=iPA+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 1, 2018 at 3:17 PM Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> wrote:
> On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:
> > Thank you, Arthur. These examples shows downside of this patch, where
> > users may be faced with incompatibility. But it's good that this
> > situation can be handled by altering format string. I think these
> > examples should be added to the documentation and highlighted in
> > release notes.
>
> I updated the documentation. I added a tip text which explains
> how to_timestamp() and to_date() handled ordinary text prior to
> PostgreSQL 11 and how they should handle ordinary text now.
>
> There is now changes in the code and tests.

Thank you, Arthur!

I made following observations on Oracle behavior.
1) Oracle also supports parsing TZH in to_timestamp_tz() function.
Previously, in order to handle TZH (which could be negative) we
decided to skip spaces before fields, but not after fields [1][2].
That leads to behavior, which is both incompatible with Oracle and
unlogical (at least in my opinion). But after exploration of
to_timestamp_tz() behavior I found that Oracle can distinguish minus
sign used as separator from minus sign in TZH field.

# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01--10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

# select to_char(to_timestamp_tz('2018-01-01---10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

So, if number of spaces/separators between fields in input string is
more than in format string and list space/separator skipped is minus
sign, then it decides to glue that minus sign to TZH. I think this
behavior is nice to follow, because it allows us to skip spaces after
fields.

2) It appears that Oracle skips spaces not only before fields, but
also in the beginning of the input string.

SELECT to_timestamp(' -2018', ' YYYY') FROM dual
# 01.08.2018 00:00:00

In the example given it seems that first space is skipped, while minus
sign is matched to space.

3) When multiple separators are specified in the format string, Oracle
doesn't allow skipping arbitrary number of spaces before each
separator as we did.

# SELECT to_timestamp('2018- -01 02', 'YYYY--MM-DD') FROM dual
ORA-01843: not a valid month

4) Spaces and separators in format string seem to be handled in the
same way in non-FX mode. But strange things happen when you mix
spaces and separators there.

# SELECT to_timestamp('2018- -01 02', 'YYYY---MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', 'YYYY MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', 'YYYY- -MM-DD') FROM dual
ORA-01843: not a valid month

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

1+2+3 are implemented in the attached patch, but not 4. I think that
it's only worth to follow Oracle when it does reasonable things.

I also plan to revise documentation and regression tests in the patch.
But I wanted to share my results so that everybody can give a feedback
if any.

1. https://www.postgresql.org/message-id/CAEepm%3D1Y7z1VSisBKxa6x3E-jP-%2B%3DrWfw25q_PH2nGjqVcX-rw%40mail.gmail.com
2. https://www.postgresql.org/message-id/20180112125848.GA32559%40zakirov.localdomain------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-to-timestamp-format-checking-v14.patch application/octet-stream 15.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-08-02 16:03:09 Re: Ideas for a relcache test mode about missing invalidations
Previous Message Nikhil Sontakke 2018-08-02 15:09:19 Re: [HACKERS] logical decoding of two-phase transactions