Re: Inaccurate results from numeric ln(), log(), exp() and pow()

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inaccurate results from numeric ln(), log(), exp() and pow()
Date: 2015-11-13 08:15:06
Message-ID: CAEZATCWQLzMhH3ZWTXC9vhAX8RsUtVgCCw-3gbZMn3amdAkYtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 November 2015 at 21:01, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> These results are based on the attached, updated patch which includes
>> a few minor improvements.
>
> I started to look at this patch, and was immediately bemused by the
> comment in estimate_ln_weight:
>
> /*
> * 0.9 <= var <= 1.1
> *
> * Its logarithm has a negative weight (possibly very large). Estimate
> * it using ln(var) = ln(1+x) = x + O(x^2) ~= x.
> */
>
> This comment's nonsense of course: ln(0.9) is about -0.105, and ln(1.1) is
> about 0.095, which is not even negative much less "very large".

That's nonsense. The comment is perfectly correct. It's not saying the
logarithm is negative, it's saying that the *weight* of the logarithm
is negative. For the examples you cite, the weight is small and
negative (around -1), but for inputs closer to 1 it can be large and
negative.

> We could
> just replace that with "For values reasonably close to 1, we can estimate
> the result using ln(var) = ln(1+x) ~= x." I am wondering what's the point
> though: why not flush this entire branch in favor of always using the
> generic case? I rather doubt that on any modern machine two uses of
> cmp_var plus init_var/sub_var/free_var are going to be cheaper than the
> log() call you save. You would need a different way to special-case 1.0,
> but that's not expensive.
>

No, you're missing the point entirely. Suppose var =
1.000000000000000000001, in which case ln(var) is approximately 1e-21.
The code in the first branch uses the approximation ln(var) = ln(1+x)
~= x = var-1 to see that the weight of ln(var) is around -21. You
can't do that with the code in second branch just by looking at the
first couple of digits of var and doing a floating point
approximation, because all you'd see would be 1.0000000 and your
approximation to the logarithm would be orders of magnitude out (as is
the case with the code that this function is replacing, which comes
with no explanatory comments at all).

> Larger questions are
>
> (1) why the Abs() in the specification of estimate_ln_weight --- that
> doesn't comport with the text about "Estimate the weight of the most
> significant digit".

Yes it does, and the formula is correct. The function returns an
estimate for the weight of the logarithm of var. Let L = ln(var), then
what we are trying to return is the weight of L. So we don't care
about the sign of L, we just need to know it's weight -- i.e., we want
approximately log10(Abs(L)) which is log10(Abs(ln(var))) as the
comment says. The Abs() is necessary because L might be negative (when
0 < var < 1).

> (The branch I'm proposing you remove fails to
> honor that anyway.)
>

No it doesn't. It honours the Abs() by ignoring the sign of x, and
just looking at its weight.

> (2) what should the behavior be for input == 1, and why? The code
> is returning zero, but that seems inconsistent or at least
> underdocumented.
>

ln(1) is 0, which has a weight of 0. I suppose you could argue that
technically 0 could have any weight, but looking at the bigger
picture, what this function is for is deciding how many digits of
precision are needed in intermediate calculations to retain full
precision in the final result. When the input is exactly 1, the
callers will have nice exact results, and no extra intermediate
precision is needed, so returning 0 is quite sensible.

I guess adding words to that effect to the comment makes sense, since
it clearly wasn't obvious.

> (3) if it's throwing an error for zero input, why not for negative
> input? I'm not sure that either behavior is within the purview of
> this function, anyway. Seems like returning zero might be fine.
>
> regards, tom lane

[Shrug] It doesn't make much difference since both those error cases
will be caught a little later on in the callers, but since this
function needs to test for an empty digit array in the second branch
anyway, it seemed like it might as well report that error there.
Returning zero would be fine too.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Borodin 2015-11-13 10:41:49 pg_rewind and replication slots
Previous Message Amit Kapila 2015-11-13 07:31:29 Re: checkpointer continuous flushing