Re: Issue in to_timestamp/to_date while handling the quoted literal string

From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issue in to_timestamp/to_date while handling the quoted literal string
Date: 2019-07-24 12:54:39
Message-ID: CADxJZo2WbU8w+vBHA9fZ7HkfVnwABMwe3N6+t5BwsLATi8vNbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Suraj,

I think the documentation is reasonably clear about this behaviour, quote:

" In to_date, to_number, and to_timestamp, literal text and double-quoted
strings result in skipping the number of characters contained in the
string; for example "XX" skips two input characters (whether or not they
are XX)."

I can appreciate that this isn't the behaviour you intuitively expected
from to_timestamp, and I don't think you'd be the first or the last. The
purpose of these functions was never to validate that your input string
precisely matches the non-coding parts of your format pattern. For that, I
think you'd be better served by using regular expressions.

Just as an aside, in the example you gave, the string '2019-05-24T23:12:45'
will cast directly to timestamp just fine, so it isn't the kind of
situation to_timestamp was really intended for. It's more for when your
input string is in an obscure (or ambiguous) format that is known to you in
advance.

I hope that helps.

Cheers
Brendan

On Wed, 24 Jul 2019 at 21:38, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
wrote:

> Hi,
>
> I noticed the issue in to_timestamp()/to_date() while handling the double
> quote literal string. If any double quote literal characters found in
> format, we generate the NODE_TYPE_CHAR in parse format and store that
> actual character in FormatNode->character. n DCH_from_char, we just
> increment the input string by length of character for NODE_TYPE_CHAR.
> We are actually not matching these characters in input string and because
> of this, date values get changed if quoted literal string is not identical
> in input and format string.
>
> e.g:
>
> postgres(at)78619=#select to_timestamp('2019-05-24T23:12:45',
> 'yyyy-mm-dd"TT"hh24:mi:ss');
> to_timestamp
> ---------------------------
> 2019-05-24 03:12:45+05:30
> (1 row)
>
>
> In above example, the quoted string is 'TT', so it just increment the
> input string by 2 while handling these characters and returned the wrong
> hour value.
>
> My suggestion is to match the exact characters from quoted literal string
> in input string and if doesn't match then throw an error.
>
> Attached is the POC patch which almost works for all scenarios except for
> whitespace - as a quote character.
>
> Suggestions?
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-07-24 12:59:05 Re: psql - add SHOW_ALL_RESULTS option
Previous Message Jehan-Guillaume de Rorthais 2019-07-24 12:33:27 Re: Fetching timeline during recovery