Re: [PATCHES] to_date() validation

From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Alex Hunsaker" <badalex(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] to_date() validation
Date: 2008-09-09 11:04:39
Message-ID: 37ed240d0809090404w2490d375l60b0fc972f899cec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> I actually had a look at this patch also, though not as thoroughly as
> Alex. There was one part that I had some thoughts about in from_char_parse_int_len:
>

Hi Martijn. Thanks for your comments.

> The use of palloc/pfree in this routine seems excessive. Does len have
> upper bound? If so a simple array will do it.
>

There isn't any *theoretical* upper bound on len. However, in
practice, with the current set of formatting nodes, the largest len
that will ever be passed to rom_char_parse_int_len() is 6 (for the
microsecond 'US' node).

I suppose I could define a constant FORMATNODE_MAX_LEN, make it
something like 10 and just use that for copying the string, rather
than palloc(). I'll give it a try.

> ! if (strlen(first) < len)
>
> Here you check the length of the remaining string and complain if it's
> too short, but:
>
> <snip>
> ! result = strtol(first, &last, 10);
> ! *src += (last - first);
>
> Here you do not note if we didn't convert the entire string. So it
> seems you are allowed to provide too few characters, as long as it's
> not the last field?

That's true. The only way to hit that condition would be to provide a
semi-bogus value in a string with no separators between the numbers.
For example:

postgres=# SELECT to_date('20081o13', 'YYYYMMDD');
ERROR: invalid value for "DD" in source string
DETAIL: Value must be an integer.

What happens here is that strtol() happily consumes the '1' for the
format node MM, and as you point out it does not complain that it
expected to consume two characters and only got one. On the next node
(DD), the function tries to start parsing an integer, but the first
character it encounters is 'o', so it freaks out.

Certainly the error message here should be more apropos, and tell the
user that the problem is in the MM node. Blaming the DD node is pure
red herring.

However, if the mistake is at the end of the string, there is no error at all:

postgres=# SELECT to_date('2008101!', 'YYYYMMDD');
to_date
------------
2008-10-01

This is because the end of the string counts as a "separator", so we
just run an unbounded strtol() on whatever characters remain in the
string.

As discussed in my response to Alex's review, I made the end of the
string a separator so that things like 'DD-MM-YYYY' would actually
work for years with more than four digits.

Now I'm wondering if that was the wrong way to go. The case for years
with more than four digits is extremely narrow, and if somebody really
wanted to parse '01-01-20008' as 1 Jan 20,008 they could always use
the 'FM' modifier to get the behaviour they want ('DD-MM-FMYYYY').

I'll send in a new version which addresses these issues.

Cheers,
BJ

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2008-09-09 11:12:32 Re: Synchronous Log Shipping Replication
Previous Message ITAGAKI Takahiro 2008-09-09 10:58:48 Re: Synchronous Log Shipping Replication

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-09-09 12:11:45 Re: [PgFoundry] Unsigned Data Types [1 of 2]
Previous Message Martijn van Oosterhout 2008-09-09 09:29:19 Re: [PATCHES] to_date() validation