Re: Bug in to_timestamp().

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in to_timestamp().
Date: 2016-08-11 09:46:28
Message-ID: b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On 14.07.2016 12:16, Pavel Stehule wrote:
>
> last point was discussed in thread related to to_date_valid function.
>
> Regards
>
> Pavel

Thank you.

Here is my patch. It is a proof of concept.

Date/Time Formatting
--------------------

There are changes in date/time formatting rules:

- now to_timestamp() and to_date() skip spaces in the input string and
in the formatting string unless FX option is used, as Amul Sul wrote on
first message of this thread. But Ex.2 gives an error now with this
patch (should we fix this too?).

- in the code space characters and separator characters have different
types of FormatNode. Separator characters are characters ',', '-', '.',
'/' and ':'. This is done to have different rules of formatting to space
and separator characters.
If FX option isn't used then PostgreSQL do not insist that separator in
the formatting string should match separator in the formatting string.
But count of separators should be equal with or without FX option.

- now PostgreSQL check is there a closing quote. Otherwise the error is
raised.

Still PostgreSQL do not insist that text character in the formatting
string should match text character in the input string. It is not
obvious if this should be fixed. Because we may have different character
case or character with accent mark or without accent mark.
But I suppose that it is not right just check text character count. For
example, there is unicode version of space character U+00A0.

Code changes
------------

- new defines:

#define NODE_TYPE_SEPARATOR 4
#define NODE_TYPE_SPACE 5

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

This patch do not handle all noticed issues in this thread, since still
there is not consensus about them. So this patch in a proof of concept
status and it can be changed.

Of course this patch can be completely wrong. But it tries to introduce
more formal rules for formatting.

I will be grateful for notes and remarks.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
to-timestamp-format-checking.patch text/x-patch 17.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2016-08-11 10:04:19 [Patch] New psql prompt substitution %r (m = master, r = replica)
Previous Message Palle Girgensohn 2016-08-11 09:15:36 Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)