Re: Fix number skipping in to_number

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Oliver Ford <ojford(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix number skipping in to_number
Date: 2017-08-17 00:45:42
Message-ID: CAEepm=2KXZ+8BdJsopK0Bq=GuJRsYnDbwqf0U_HsocN46xE05w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2017 at 10:21 PM, Oliver Ford <ojford(at)gmail(dot)com> wrote:
> Prevents an issue where numbers can be skipped in the to_number()
> function when the format mask contains a "G" or a "," but the input
> string doesn't contain a separator. This resolves the TODO item "Fix
> to_number() handling for values not matching the format string".

From the TODO I found the previous discussion here:

https://www.postgresql.org/message-id/flat/be46a4f30909212157o71dc82bep7e074f9fa7eb1d14%40mail.gmail.com

There were some votes against changing it and some votes for. I don't
see what's good about the current behaviour. It's hard (though not
impossible) to imagine that someone is depending on that weirdness,
and it's harder to believe that anyone wants that behaviour knowingly.
For people migrating from Oracle or writing application portable to
both, the current behaviour sucks. It's not a SQL standard feature,
it's a be-like-Oracle feature, so it should be like Oracle. Aside
from that, by the principle of least astonishment alone I think it's
pretty clear that "a separator goes here" shouldn't mean "eat a
digit".

So +1 from me.

> == Change ==
>
> Currently, if the format mask in to_number() has a "G" or a ",", it
> will assume that the input string contains a separator character in
> the same place. If however a number is there instead, that number will
> be silently skipped and not appear in the output. So we get:
>
> select to_number('34,50','999,99');
> to_number
> -----------
> 340
> (1 row)
>
> This patch checks the input string when it encounters a "G" or "," in
> the format mask. If the separator character is found, the code moves
> over it as normal. If it's not found, then the code no longer
> increments the relevant pointer so as not to skip the character. After
> the patch, we get the correct result:
>
> select to_number('34,50','999,99');
> to_number
> -----------
> 3450
> (1 row)
>
> This is in line with Oracle's result.

In other words the separators are completely ignored. Presumably the
only reason you'd have them at all is so that you could use the same
format string in to_number() and to_char() calls to do round-trip
conversions. That seems reasonable to me.

> == Rationale ==
>
> This patch is a small change, which leaves PostgreSQL behavior
> different from Oracle behavior in related cases. Oracle's
> implementation seems to read from right-to-left, and raises an
> "ORA-01722: invalid number" error if there are digits in the input
> string which don't have corresponding characters in the format mask. I
> have chosen not to throw such errors, because there are use cases for
> only returning part of a number string. For example, the following is
> an error on Oracle:
>
> select to_number('123,000', '999G') from dual;
> Error report -
> SQL Error: ORA-01722: invalid number
>
> But if you wanted to only take the characters before the comma, and
> discard the thousands part, you can do so on PostgreSQL with:
>
> select to_number('123,000', '999G');
> to_number
> -----------
> 123
> (1 row)

I can see that it's hard to change this one, because it's more likely
to break existing applications that were written by people who didn't
think of the format string as limiting the maximum size of the number.
I think someone could argue for changing that too, but I agree with
your choice for this patch.

> == Testing ==
>
> Tested on Windows with MinGW using the latest checkout from master.
> Added regression tests to check for this new behavior. All existing
> tests pass.

The tests you added pass for me but the int8 test now fails with the
following (this is from regression.diff after running
'PG_REGRESS_DIFF_OPTS="-dU10" make check'). It looks like some new
whitespace is appearing on the right in the output of to_char(). I
didn't try to understand why.

@@ -453,34 +453,34 @@
------------------+------------------
4567890123456789 | 4567890123456789
(1 row)

-- TO_CHAR()
--
SELECT '' AS to_char_1, to_char(q1, '9G999G999G999G999G999'),
to_char(q2, '9,999,999,999,999,999')
FROM INT8_TBL;
to_char_1 | to_char | to_char
-----------+------------------------+------------------------
- | 123 | 456
+ | 123 | 456
| 123 | 4,567,890,123,456,789
- | 4,567,890,123,456,789 | 123
+ | 4,567,890,123,456,789 | 123
| 4,567,890,123,456,789 | 4,567,890,123,456,789
| 4,567,890,123,456,789 | -4,567,890,123,456,789
(5 rows)

SELECT '' AS to_char_2, to_char(q1, '9G999G999G999G999G999D999G999'),
to_char(q2, '9,999,999,999,999,999.999,999')
FROM INT8_TBL;
to_char_2 | to_char | to_char
-----------+--------------------------------+--------------------------------
- | 123.000,000 | 456.000,000
+ | 123.000,000 | 456.000,000
| 123.000,000 | 4,567,890,123,456,789.000,000
- | 4,567,890,123,456,789.000,000 | 123.000,000
+ | 4,567,890,123,456,789.000,000 | 123.000,000
| 4,567,890,123,456,789.000,000 | 4,567,890,123,456,789.000,000
| 4,567,890,123,456,789.000,000 | -4,567,890,123,456,789.000,000
(5 rows)

SELECT '' AS to_char_3, to_char( (q1 * -1), '9999999999999999PR'),
to_char( (q2 * -1), '9999999999999999.999PR')
FROM INT8_TBL;
to_char_3 | to_char | to_char
-----------+--------------------+------------------------
| <123> | <456.000>
| <123> | <4567890123456789.000>

======================================================================

One superficial comment after first glimpse at the patch:

+ if(!strncmp(Np->inout_p, Np->L_thousands_sep, separator_len))

I believe the usual coding around here would be if (strncmp(...) == 0)

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-17 01:01:36 Re: Function to move the position of a replication slot
Previous Message Peter Eisentraut 2017-08-17 00:22:27 Re: [COMMITTERS] pgsql: Include foreign tables in information_schema.table_privileges