Fix number skipping in to_number

From: Oliver Ford <ojford(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Fix number skipping in to_number
Date: 2017-08-10 10:21:21
Message-ID: CAGMVOdvpbMqPf9XWNzOwBpzJfErkydr_fEGhmuDGa015z97mwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Prevents an issue where numbers can be skipped in the to_number()
function when the format mask contains a "G" or a "," but the input
string doesn't contain a separator. This resolves the TODO item "Fix
to_number() handling for values not matching the format string".

== Change ==

Currently, if the format mask in to_number() has a "G" or a ",", it
will assume that the input string contains a separator character in
the same place. If however a number is there instead, that number will
be silently skipped and not appear in the output. So we get:

select to_number('34,50','999,99');
to_number
-----------
340
(1 row)

This patch checks the input string when it encounters a "G" or "," in
the format mask. If the separator character is found, the code moves
over it as normal. If it's not found, then the code no longer
increments the relevant pointer so as not to skip the character. After
the patch, we get the correct result:

select to_number('34,50','999,99');
to_number
-----------
3450
(1 row)

This is in line with Oracle's result.

== Rationale ==

This patch is a small change, which leaves PostgreSQL behavior
different from Oracle behavior in related cases. Oracle's
implementation seems to read from right-to-left, and raises an
"ORA-01722: invalid number" error if there are digits in the input
string which don't have corresponding characters in the format mask. I
have chosen not to throw such errors, because there are use cases for
only returning part of a number string. For example, the following is
an error on Oracle:

select to_number('123,000', '999G') from dual;
Error report -
SQL Error: ORA-01722: invalid number

But if you wanted to only take the characters before the comma, and
discard the thousands part, you can do so on PostgreSQL with:

select to_number('123,000', '999G');
to_number
-----------
123
(1 row)

This is the current behavior. Which is why I think it makes more sense
to do what PostgreSQL currently does and read from left-to-right. The
only change, as mentioned above, is that the current behavior can skip
a digit:

select to_number('123456', '999G999');
to_number
-----------
12356
(1 row)

After the patch, this returns all the digits:

select to_number('123456', '999G999');
to_number
-----------
123456
(1 row)

== Testing ==

Tested on Windows with MinGW using the latest checkout from master.
Added regression tests to check for this new behavior. All existing
tests pass.

Attachment Content-Type Size
0001-apply-number-v1.patch application/octet-stream 3.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message AP 2017-08-10 10:41:37 Re: pgsql 10: hash indexes testing
Previous Message Beena Emerson 2017-08-10 09:52:55 Re: expanding inheritance in partition bound order