Re: Bug in to_timestamp().

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: amul sul <sulamul(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in to_timestamp().
Date: 2016-11-04 18:36:37
Message-ID: 8744.1478284597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> writes:
> I attached new version of the patch, which fix is_char_separator()
> declaration too.

I did some experimenting using
http://rextester.com/l/oracle_online_compiler

It appears that Oracle will consider a single space in the pattern
to match zero or more spaces in the input, as all of these produce
the expected result:

SELECT to_timestamp('2000JUN', 'YYYY MON') FROM dual
SELECT to_timestamp('2000 JUN', 'YYYY MON') FROM dual
SELECT to_timestamp('2000 JUN', 'YYYY MON') FROM dual

Also, a space in the pattern will match a single separator character
in the input, but not multiple separators:

SELECT to_timestamp('2000-JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000--JUN', 'YYYY MON') FROM dual
ORA-01843: not a valid month

And you can have whitespace along with that single separator:

SELECT to_timestamp('2000 + JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000 + JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000 ++ JUN', 'YYYY MON') FROM dual
ORA-01843: not a valid month

You can have leading whitespace, but not leading separators:

SELECT to_timestamp(' 2000 JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('/2000 JUN', 'YYYY MON') FROM dual
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

These all work:

SELECT to_timestamp('2000 JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000/JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000-JUN', 'YYYY/MON') FROM dual

but not

SELECT to_timestamp('2000//JUN', 'YYYY/MON') FROM dual
ORA-01843: not a valid month
SELECT to_timestamp('2000--JUN', 'YYYY/MON') FROM dual
ORA-01843: not a valid month

which makes it look a lot like Oracle treats separator characters in the
pattern the same as spaces (but I haven't checked their documentation to
confirm that).

The proposed patch doesn't seem to me to be trying to follow
these Oracle behaviors, but I think there is very little reason for
changing any of this stuff unless we move it closer to Oracle.

Some other nitpicking:

* I think the is-separator function would be better coded like

static bool
is_separator_char(const char *str)
{
/* ASCII printable character, but not letter or digit */
return (*str > 0x20 && *str < 0x7F &&
!(*str >= 'A' && *str <= 'Z') &&
!(*str >= 'a' && *str <= 'z') &&
!(*str >= '0' && *str <= '9'));
}

The previous way is neither readable nor remarkably efficient, and it
knows much more about the ASCII character set than it needs to.

* Don't forget the cast to unsigned char when using isspace() or other
<ctype.h> functions.

* I do not see the reason for throwing an error here:

+ /* Previous character was a backslash */
+ if (in_backslash)
+ {
+ /* After backslash should go non-space character */
+ if (isspace(*str))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid escape sequence")));
+ in_backslash = false;

Why shouldn't backslash-space be a valid quoting sequence?

I'll set this back to Waiting on Author.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-04 19:15:33 Re: Mention column name in error messages
Previous Message Doug Doole 2016-11-04 17:52:25 Re: Improving executor performance