Re: proposal: minscale, rtrim, btrim functions for numeric

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Marko Tiikkaja <marko(at)joh(dot)to>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: proposal: minscale, rtrim, btrim functions for numeric
Date: 2019-12-08 01:23:25
Message-ID: 20191207192325.343781fc@slate.karlpinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Pavel,

On Mon, 11 Nov 2019 15:47:37 +0100
Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> Here is a patch. It's based on Dean's suggestions.
>
> I implemented two functions - first minscale, second trim_scale. The
> overhead of second is minimal - so I think it can be good to have it.
> I started design with the name "trim_scale", but the name can be any
> other.

Here are my thoughts on your patch.

My one substantial criticism is that I believe that
trim_scale('NaN'::numeric) should return NaN.
So the test output should look like:

template1=# select trim_scale('nan'::numeric) = 'nan'::numeric;
?column?
----------
t
(1 row)

FWIW, I bumped around the Internet and looked at Oracle docs to see if
there's any reason why minscale() might not be a good function name.
I couldn't find any problems.

I also couldn't think of a better name than trim_scale() and don't
have any problems with the name.

My other suggestions mostly have to do with documentation. Your code
looks pretty good to me, looks like the existing code, you name
variables and function names as in existing code, etc.

I comment on various hunks in line below:

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28eb322f3f..6f142cd679 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -918,6 +918,19 @@
<entry><literal>6.0000000000</literal></entry>
</row>

+ <row>
+ <entry>
+ <indexterm>
+ <primary>minscale</primary>
+ </indexterm>
+
<literal><function>minscale(<type>numeric</type>)</function></literal>
+ </entry>
+ <entry><type>integer</type></entry>
+ <entry>returns minimal scale of the argument (the number of
decimal digits in the fractional part)</entry>
+ <entry><literal>scale(8.4100)</literal></entry>
+ <entry><literal>2</literal></entry>
+ </row>
+
<row>
<entry>
<indexterm>

*****
Your description does not say what the minimal scale is. How about:

minimal scale (number of decimal digits in the fractional part) needed
to store the supplied value without data loss
*****

@@ -1041,6 +1054,19 @@
<entry><literal>1.4142135623731</literal></entry>
</row>

+ <row>
+ <entry>
+ <indexterm>
+ <primary>trim_scale</primary>
+ </indexterm>
+
<literal><function>trim_scale(<type>numeric</type>)</function></literal>
+ </entry>
+ <entry><type>numeric</type></entry>
+ <entry>reduce scale of the argument (the number of decimal
digits in the fractional part)</entry>
+ <entry><literal>scale(8.4100)</literal></entry>
+ <entry><literal>8.41</literal></entry>
+ </row>
+
<row>
<entry>
<indexterm>

****
How about:

reduce the scale (the number of decimal digits in the fractional part)
to the minimum needed to store the supplied value without data loss
*****

diff --git a/src/backend/utils/adt/numeric.c
b/src/backend/utils/adt/numeric.c index a00db3ce7a..35234aee4c 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c

****
I believe the hunks in this file should start at about line# 3181.
This is right after numeric_scale(). Seems like all the scale
related functions should be together.

There's no hard standard but I don't see why lines (comment lines in
your case) should be longer than 78 characters without good reason.
Please reformat.
****

@@ -5620,6 +5620,88 @@ int2int4_sum(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(Int64GetDatumFast(transdata->sum));
}

+/*
+ * Calculate minimal display scale. The var should be stripped already.

****
I think you can get rid of the word "display" in the comment.
****

+ */
+static int
+get_min_scale(NumericVar *var)
+{
+ int minscale = 0;
+
+ if (var->ndigits > 0)
+ {
+ NumericDigit last_digit;
+
+ /* maximal size of minscale, can be lower */
+ minscale = (var->ndigits - var->weight - 1) *
DEC_DIGITS; +
+ /*
+ * When there are not digits after decimal point, the
previous expression

****
s/not/no/
****

+ * can be negative. In this case, the minscale must be
zero.
+ */

****
s/can be/is/
****

+ if (minscale > 0)
+ {
+ /* reduce minscale if trailing digits in last
numeric digits are zero */
+ last_digit = var->digits[var->ndigits - 1];
+
+ while (last_digit % 10 == 0)
+ {
+ minscale--;
+ last_digit /= 10;
+ }
+ }
+ else
+ minscale = 0;
+ }
+
+ return minscale;
+}
+
+/*
+ * Returns minimal scale of numeric value when value is not changed

****
Improve comment, something like:
minimal scale required to represent supplied value without loss
****

+ */
+Datum
+numeric_minscale(PG_FUNCTION_ARGS)
+{
+ Numeric num = PG_GETARG_NUMERIC(0);
+ NumericVar arg;
+ int minscale;
+
+ if (NUMERIC_IS_NAN(num))
+ PG_RETURN_NULL();
+
+ init_var_from_num(num, &arg);
+ strip_var(&arg);
+
+ minscale = get_min_scale(&arg);
+ free_var(&arg);
+
+ PG_RETURN_INT32(minscale);
+}
+
+/*
+ * Reduce scale of numeric value so value is not changed

****
Likewise, comment text could be improved
****

+ */
+Datum
+numeric_trim_scale(PG_FUNCTION_ARGS)
+{
+ Numeric num = PG_GETARG_NUMERIC(0);
+ Numeric res;
+ NumericVar result;
+
+ if (NUMERIC_IS_NAN(num))
+ PG_RETURN_NULL();
+
+ init_var_from_num(num, &result);
+ strip_var(&result);
+
+ result.dscale = get_min_scale(&result);
+
+ res = make_result(&result);
+ free_var(&result);
+
+ PG_RETURN_NUMERIC(res);
+}

/*
----------------------------------------------------------------------
* diff --git a/src/include/catalog/pg_proc.dat
b/src/include/catalog/pg_proc.dat index 58ea5b982b..e603a5d8dd 100644

****
How about moving these new lines to right after line# 4255, the
scale() function?
****

--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -4288,6 +4288,12 @@
proname => 'width_bucket', prorettype => 'int4',
proargtypes => 'numeric numeric numeric int4',
prosrc => 'width_bucket_numeric' },
+{ oid => '3434', descr => 'returns minimal scale of numeric value',

****
How about a descr of?:

minimal scale needed to store the supplied value without data loss
****

+ proname => 'minscale', prorettype => 'int4', proargtypes =>
'numeric',
+ prosrc => 'numeric_minscale' },
+{ oid => '3435', descr => 'returns numeric value with minimal scale',

****
And likewise a descr of?:

numeric with minimal scale needed to represent the given value
****

+ proname => 'trim_scale', prorettype => 'numeric', proargtypes =>
'numeric',
+ prosrc => 'numeric_trim_scale' },

{ oid => '1747',
proname => 'time_pl_interval', prorettype => 'time',
diff --git a/src/test/regress/expected/numeric.out
b/src/test/regress/expected/numeric.out index 1cb3c3bfab..778c204b13
100644

****
I have suggestions:

Give the 2 functions separate comments (-- Tests for minscale() and
-- Tests for trim_scale())

Put () after the function names in the comments
because that's what scale() does.

Move the lines so the tests are right after the tests of scale().

Be explicit when testing for NULL or NaN. I don't know that this is
consistent with the rest of the regression tests but I don't see how
being explicit could be wrong. Otherwise NULL and NaN are output the
same ("") and you're not really testing.

So test with expressions like "foo(NULL) IS NULL" or
"foo('NaN'::NUMERIC) = 'NaN::NUMERIC" and look for t (or f) results.

****

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-08 05:14:32 Re: Windows buildfarm members vs. new async-notify isolation test
Previous Message Grigory Smolkin 2019-12-08 01:03:01 Re: [proposal] recovery_target "latest"