Re: Add Roman numeral conversion to to_number

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Oliver Ford <ojford(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Roman numeral conversion to to_number
Date: 2017-09-17 08:11:27
Message-ID: CAF4Au4x0wCtB6V_R57XOCyZ71z+rz3uzAxMtBMJ3QVzBNZJc+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 Aug 2017 16:29, "Oliver Ford" <ojford(at)gmail(dot)com> wrote:

Adds to the to_number() function the ability to convert Roman numerals
to a number. This feature is on the formatting.c TODO list. It is not
currently implemented in either Oracle, MSSQL or MySQL so gives
PostgreSQL an edge :-)

I see use of this in full text search as a dictionary. It's useful for
indexing and searching historical documents. Probably, better to have as
contrib.

==Usage==

Call: to_number(numerals, 'RN') or to_number(numerals, 'rn').

Example: to_number('MMMXIII', 'RN') returns 3013. to_number('xiv',
'rn') returns 14.

The function is case insensitive for the numerals. It accepts a mix of
cases and treats them the same. So to_number ('MCI, 'rn') and
to_number ('McI', 'RN') both return 1101. The format mask must however
be either 'RN' or 'rn'. If there are other elements in the mask, those
other elements will be ignored. So to_number('MMM', 'FMRN') returns
3000.

Whitespace before the numerals is ignored.

==Validation==

The new function roman_to_int() in formatting.c performs the
conversion. It strictly validates the numerals based on the following
Roman-to-Arabic conversion rules:

1. The power-of-ten numerals (I, X, C, M) can be repeated up to three
times in a row. The beginning-with-5 numerals (V, L, D) can each
appear only once.

2. Subtraction from a power-of-ten numeral cannot occur if a
beginning-with-5 numeral appears later.

3. Subtraction cannot occur if the smaller numeral is less than a
tenth of the greater numeral (so IX is valid, but IC is invalid).

4. There cannot be two subtractions in a row.

5. A beginning-with-5 numeral cannot subtract.

If any of these rules are violated, an error is raised.

==Testing==

This has been tested on a Windows build of the master branch with
MinGW. The included regression tests positively test every value from
1 to 3999 (the Roman numeral max value) by calling the existing
to_char() function to get the Roman value, then converting it back to
an Arabic value. There are also negative tests for each invalid code
path and some positive mixed-case tests.

Documentation is updated to include this new feature.

==References==

http://sierra.nmsu.edu/morandi/coursematerials/RomanNumerals.html
Documents the strict Roman numeral standard.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-09-17 10:15:14 Re: SQL/JSON in PostgreSQL
Previous Message Oleg Bartunov 2017-09-17 08:08:11 Re: SQL/JSON in PostgreSQL