Re: Inaccurate results from numeric ln(), log(), exp() and pow()

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Inaccurate results from numeric ln(), log(), exp() and pow()
Date: 2015-09-20 16:53:07
Message-ID: CAEZATCVit3zjGinEits1ZTmCkdGvXxbbSqJeUXZJYD5HDrtwxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 September 2015 at 15:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> FWIW, in that particular example I'd happily take the 27ms time to get
> the more accurate answer. If it were 270ms, maybe not. I think my
> initial reaction to this patch is "are there any cases where it makes
> things 100x slower ... especially for non-outrageous inputs?" If not,
> sure, let's go for more accuracy.
>

On 16 September 2015 at 17:03, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> I'll try to do some more comprehensive performance testing over the
> next few days.
>

I've done some more performance testing, and the results are broadly
in line with my initial expectations. There are a couple of cases
where pow() with non-integer powers is hundreds of times slower. This
happens when inputs with small numbers of significant digits generate
results with thousands of digits, that the code in HEAD doesn't
calculate accurately. However, there do not appear to be any cases
where this happens for "non-outrageous" inputs.

There are also cases where the new code is hundreds or even thousands
of times faster, mainly due to it making better choices for the local
rscale, and the reduced use of sqrt() in ln_var().

I wrote a script to test each function with a range of inputs, some
straightforward, and some intentionally difficult to compute. Attached
is the script's output. The columns in the output are:

* Function being called.
* The input(s) passed to it.
* Number of significant digits in the inputs (not counting trailing zeros).
* Number of significant digits in the output (HEAD vs patched code).
* Number of output digits on the right that differ between the two.
* Average function call time in HEAD.
* Average function call time with the patch.
* How many times faster or slower the patched code is.

There is a huge spread of function call times, both before and after
the patch, and the overall performance profile has changed
significantly, but in general the patched code is faster more often
than it is slower, especially for "non-outrageous" inputs.

All the cases where it is significantly slower are when the result is
significantly more accurate, but it isn't always slower to generate
more accurate results.

These results are based on the attached, updated patch which includes
a few minor improvements. The main changes are:

* In mul_var() instead of just ripping out the faulty input truncation
code, I've now replaced it with code that correctly truncates the
inputs as much as possible when the exact answer isn't required. This
produces a noticeable speedup in a number of cases. For example it
reduced the time to compute exp(5999.999) from 27ms to 20ms.

* Also in mul_var(), the simple measure of swapping the inputs so that
var1 is always the number with fewer digits, produces a worthwhile
benefit. This further reduced the time to compute exp(5999.999) to
17ms.

There's more that could be done to improve multiplication performance,
but I think that's out of scope for this patch.

Regards,
Dean

Attachment Content-Type Size
perf-test.out application/octet-stream 80.9 KB
numeric.patch text/x-diff 172.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2015-09-20 18:39:03 RemoveLocalLock pfree'ing NULL when out-of-memory
Previous Message Thom Brown 2015-09-20 16:50:16 Re: jsonb_set array append hack?