Re: Inaccurate results from numeric ln(), log(), exp() and pow()

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inaccurate results from numeric ln(), log(), exp() and pow()
Date: 2015-11-14 12:29:15
Message-ID: CAEZATCUxntMMpirtDe6-_U5rtxUR4aqRt7t6F9eMaczNArmwdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 November 2015 at 21:34, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On 13 November 2015 at 18:36, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Next question: in the additional range-reduction step you added to ln_var,
>> why stop there, ie, what's the rationale for this magic number:
>>
>> if (Abs((x.weight + 1) * DEC_DIGITS) > 10)
>>
>> Seems like we arguably should do this whenever the weight isn't zero,
>> so as to minimize the number of sqrt() steps. (Yes, I see the point
>> about not getting into infinite recursion, but that only says that
>> the threshold needs to be more than 10, not that it needs to be 10^10.)
>>
>
> It's a bit arbitrary. There is a tradeoff here -- computing ln(10) is
> more expensive than doing a sqrt() since the Babylonian algorithm used
> for sqrt() is quadratically convergent, whereas the Taylor series for
> ln() converges more slowly. At the default precision, ln(10) is around
> 7 times slower than sqrt() on my machine, although that will vary with
> precision, and the sqrt()s increase the local rscale and so they will
> get slower. Anyway, it seemed reasonable to not do the extra ln()
> unless it was going to save at least a couple of sqrt()s.
>
>
>> Also, it seems a little odd to put the recursive calculation of ln(10)
>> where you did, rather than down where it's used, ie why not
>>
>> mul_var(result, &fact, result, local_rscale);
>>
>> ln_var(&const_ten, &ln_10, local_rscale);
>> int64_to_numericvar((int64) pow_10, &ni);
>> mul_var(&ln_10, &ni, &xx, local_rscale);
>> add_var(result, &xx, result);
>>
>> round_var(result, rscale);
>>
>> As you have it, ln_10 will be calculated with possibly a smaller rscale
>> than is used in this stanza. That might be all right but it seems dubious
>> --- couldn't the lower-precision result leak into digits we care about?
>>
>
> Well it still has 8 digits more than the final rscale, so it's
> unlikely to cause any loss of precision when added to the result

I thought I'd try an extreme test of that claim, so I tried the
following example:

select ln(2.0^435411);
ln
-------------------------
301803.9070347863471187

The input to ln() in this case is truly huge (possibly larger than we
ought to allow), and results in pow_10 = 131068 in ln_var(). So we
compute ln(10) with 8 extra digits of precision, multiply that by
131068, effectively shifting it up by 6 decimal digits, leaving a
safety margin of 2 extra digits at the lower end, before we add that
to the result. And the above result is indeed correct according to bc
(just don't try to compute it using l(2^435411), or you'll be waiting
a long time :-).

That leaves me feeling pretty happy about that aspect of the
computation because in all realistic cases pow_10 ought to fall way
short of that, so there's a considerable margin of safety.

I'm much less happy with the sqrt() range reduction step though. I now
realise that the whole increment local_rscale before each sqrt()
approach is totally bogus. Like elsewhere in this patch, what we ought
to be doing is ensuring that the number of significant digits remains
fixed as the weight of x is reduced. Given the slow rate of increase
of logarithms as illustrated above, we only need to keep rscale + a
few significant digits during the sqrt() range reduction. I tried the
following:

/*
* Use sqrt() to reduce the input to the range 0.9 < x < 1.1.
*
* The final logarithm will have up to around rscale+6 significant digits.
* Each sqrt() will roughly halve the weight of x, so adjust the local
* rscale as we work so that we keep this many significant digits at each
* step (plus a few more for good measure).
*/
while (cmp_var(&x, &const_zero_point_nine) <= 0)
{
sqrt_rscale = rscale - x.weight * DEC_DIGITS / 2 + 8;
sqrt_var(&x, &x, sqrt_rscale);
mul_var(&fact, &const_two, &fact, 0);
}
while (cmp_var(&x, &const_one_point_one) >= 0)
{
sqrt_rscale = rscale - x.weight * DEC_DIGITS / 2 + 8;
sqrt_var(&x, &x, sqrt_rscale);
mul_var(&fact, &const_two, &fact, 0);
}

and it passed all the tests (including the extreme case above) with
the power-10 range reduction step disabled.

So repeated use of sqrt() can be used for range reduction, even in
extreme cases, and it won't lose precision if it's done right. In
fact, in the worst case there are only 22 sqrts() by my count.

Note that this also reduces the rscale used in the Taylor series,
since local_rscale is no longer being increased above its original
value of rscale+8. I think that's OK for largely the same reasons --
the result of the Taylor series is multiplied by a factor of at most
2^22 (and usually much less than that), so the 8 extra digits ought to
be sufficient, although that could be upped a bit if you really wanted
to be sure.

We might well want to keep the power-10 argument reduction step, but
it would now be there purely on performance grounds so there's scope
for playing around with the threshold at which it kicks in.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-11-14 13:39:28 Re: BUG #13741: vacuumdb does not accept valid password
Previous Message Amit Langote 2015-11-14 08:05:04 Re: [PROPOSAL] VACUUM Progress Checker.