Re: numeric_big in make check?

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: numeric_big in make check?
Date: 2024-02-20 13:23:31
Message-ID: CAEZATCXVQQaRAPz34-B_j0H5f8baD-j+BLgMLd523h+N55Oa-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 19 Feb 2024 at 15:35, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I thought I'd try to acquire some actual facts here, so I compared
> the code coverage shown by "make check" as of HEAD, versus "make
> check" after adding numeric_big to parallel_schedule. I saw the
> following lines of numeric.c as being covered in the second run
> and not the first:
>

I don't think that tells the whole story. Code coverage only tells you
whether a particular line of code has been hit, not whether it has
been properly tested with all the values that might lead to different
cases. For example:

> sqrt_var():
> 10073 res_ndigits = res_weight + 1 - (-rscale - 1) / DEC_DIGITS;
>

To get code coverage of this line, all you need to do is ensure that
sqrt_var() is called with rscale < -1 (which can only happen from the
range-reduction code in ln_var()). You could do that by computing
ln(1e50), which results in calling sqrt_var() with rscale = -2,
causing that line of code in sqrt_var() to be hit. That would satisfy
code coverage, but I would argue that you've only really tested that
line of code properly if you also hit it with rscale = -3, and
probably a few more values, to check that the round-down division is
working as intended.

Similarly, looking at commit 18a02ad2a5, the crucial code change was
the following in power_var():

- val = Max(val, -NUMERIC_MAX_RESULT_SCALE);
- val = Min(val, NUMERIC_MAX_RESULT_SCALE);
val *= 0.434294481903252; /* approximate decimal result weight */

Any test that calls numeric_power() is going to hit those lines of
code, but to see a failure, it was necessary to hit them with the
absolute value of "val" greater than NUMERIC_MAX_RESULT_SCALE, which
is why that commit added 2 new test cases to numeric_big, calling
power_var() with "val" outside that range. Code coverage is never
going to tell you whether or not that is being tested, since the code
change was to delete lines. Even if that weren't the case, any line of
code involving Min() or Max() has 2 branches, and code coverage won't
tell you if you've hit both of them.

> Pretty poor return on investment for the runtime consumed. I don't
> object to adding something to numeric.sql that's targeted at adding
> coverage for these (or anyplace else that's not covered), but let's
> not just throw cycles at the problem.
>

I agree that blindly performing a bunch of large computations (just
throwing cycles at the problem) is not a good approach to testing. And
maybe that's a fair characterisation of parts of numeric_big. However,
it also contains some fairly well-targeted tests intended to test
specific edge cases that only occur with particular ranges of inputs,
which don't necessarily show up as increased code coverage.

So I think this requires more careful analysis. Simply deleting
numeric_big and adding tests to numeric.sql until the same level of
code coverage is achieved will not give the same level of testing.

It's also worth noting that the cost of running numeric_big has come
down very significantly over the last few years, as can be seen by
running the current numeric_big script against old backends. There's a
lot of random variation in the timings, but the trend is very clear:

9.5 1.641s
9.6 0.856s
10 0.845s
11 0.750s
12 0.760s
13 0.672s
14 0.430s
15 0.347s
16 0.336s

Arguably, this is a useful benchmark to spot performance regressions
and test proposed performance-improving patches.

If I run "EXTRA_TESTS=numeric_big make check | grep 'ok ' | sort
-nrk5", numeric_big is not in the top 10 most expensive tests (it's
usually down at around 15'th place).

Looking at the script itself, the addition, subtraction,
multiplication and division tests at the top are probably pointless,
since I would expect those operations to be tested adequately (and
probably more thoroughly) by the transcendental test cases. In fact, I
think it would probably be OK to delete everything above line 650, and
just keep the bottom half of the script -- the pow(), exp(), ln() and
log() tests, which cover various edge cases, as well as exercising
basic arithmetic operations internally. We might want to check that
I/O of large numerics is still being tested properly though.

If we did that, numeric_big would be even further down the list of
expensive tests, and I'd say it should be run by default.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-20 13:25:48 Re: Synchronizing slots from primary to standby
Previous Message Masahiko Sawada 2024-02-20 13:10:44 Re: Synchronizing slots from primary to standby