Re: Issue in to_timestamp/to_date while handling the quoted literal string

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issue in to_timestamp/to_date while handling the quoted literal string
Date: 2019-07-25 06:47:29
Message-ID: CAF1DzPXa1kYy4Gy_kLNhuexXbaBNDPcahbru+HTisioceWkg6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the clarification Brendan, that really helps.

On Wed, Jul 24, 2019 at 6:24 PM Brendan Jurd <direvus(at)gmail(dot)com> wrote:

> Hi Suraj,
>
> I think the documentation is reasonably clear about this behaviour, quote:
>
> " In to_date, to_number, and to_timestamp, literal text and double-quoted
> strings result in skipping the number of characters contained in the
> string; for example "XX" skips two input characters (whether or not they
> are XX)."
>
> I can appreciate that this isn't the behaviour you intuitively expected
> from to_timestamp, and I don't think you'd be the first or the last. The
> purpose of these functions was never to validate that your input string
> precisely matches the non-coding parts of your format pattern. For that, I
> think you'd be better served by using regular expressions.
>
> Just as an aside, in the example you gave, the string
> '2019-05-24T23:12:45' will cast directly to timestamp just fine, so it
> isn't the kind of situation to_timestamp was really intended for. It's
> more for when your input string is in an obscure (or ambiguous) format that
> is known to you in advance.
>
> I hope that helps.
>
> Cheers
> Brendan
>
> On Wed, 24 Jul 2019 at 21:38, Suraj Kharage <
> suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
>
>> Hi,
>>
>> I noticed the issue in to_timestamp()/to_date() while handling the double
>> quote literal string. If any double quote literal characters found in
>> format, we generate the NODE_TYPE_CHAR in parse format and store that
>> actual character in FormatNode->character. n DCH_from_char, we just
>> increment the input string by length of character for NODE_TYPE_CHAR.
>> We are actually not matching these characters in input string and because
>> of this, date values get changed if quoted literal string is not identical
>> in input and format string.
>>
>> e.g:
>>
>> postgres(at)78619=#select to_timestamp('2019-05-24T23:12:45',
>> 'yyyy-mm-dd"TT"hh24:mi:ss');
>> to_timestamp
>> ---------------------------
>> 2019-05-24 03:12:45+05:30
>> (1 row)
>>
>>
>> In above example, the quoted string is 'TT', so it just increment the
>> input string by 2 while handling these characters and returned the wrong
>> hour value.
>>
>> My suggestion is to match the exact characters from quoted literal string
>> in input string and if doesn't match then throw an error.
>>
>> Attached is the POC patch which almost works for all scenarios except for
>> whitespace - as a quote character.
>>
>> Suggestions?
>> --
>> --
>>
>> Thanks & Regards,
>> Suraj kharage,
>> EnterpriseDB Corporation,
>> The Postgres Database Company.
>>
>

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-07-25 06:50:40 Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message Fabien COELHO 2019-07-25 06:27:40 RE: seems like a bug in pgbench -R