Re: Review: Patch for phypot - Pygmy Hippotause

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Andrew Geery" <andrew(dot)geery(at)gmail(dot)com>, plm(at)netspace(dot)net(dot)au, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review: Patch for phypot - Pygmy Hippotause
Date: 2010-07-17 19:19:05
Message-ID: 12182.1279394345@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Andrew Geery <andrew(dot)geery(at)gmail(dot)com> wrote:
>> I found that the difference in the two calculations were always
>> less than 0.000001. However, about a third of the calculations
>> differed at one more magnitude of precision (that is, there were
>> differences in the calculations that were greater than 0.0000001).

> That's harder for me to evaluate in terms of whether it's
> acceptable. It *is* an *approximate* data type, and the differences
> are all less than 0.0001%; however, that seems like a pretty weak
> guarantee if it's making the answer less accurate. If we could take
> some of these cases with relatively large differences and see which
> of the calculations is more accurate, that might help make a
> decision. I'm not sure which technique would tend to be more
> accurate. Since they're algebraically equivalent, and what we're
> using now pushes toward underflow and overflow more readily, it
> seems possible that the differences will generally reflect a greater
> accuracy in the patch's technique.

Hm ... it's been twenty years since I did any serious numerical analysis
hacking, but ... offhand this looks to me like it's about as accurate as
the straightforward way, not really better or worse. Ignoring
overflow/underflow, the basic knock on the naive expression is that if x
is much bigger than y, you lose most or all of the significant digits in
y when you add their squares. For instance, if x is 1e8 * y, then y*y
fails to affect the sum at all (given typical float8 arithmetic), and
you'll get back sqrt(x*x) even though y should have been able to affect
the result at the 8th place or so. In the patch's calculation, y/x is
computed accurately but then we'll lose the same precision when we form
1 + yx*yx --- the result will be just 1 if y is lots smaller than x.

If we were feeling tense about this, we could look for an alternate way
of calculating sqrt(1 + yx*yx) that doesn't lose so much accuracy.
In principle I think that's doable since this expression is related to
ln(1+x) which can be calculated accurately even for very small x.
I'm not convinced that it's worth troubling over though, seeing that no
real attention has been paid to numerical stability anywhere else in the
geometric functions.

I think the patch is good in principle; what could be improved about it
is:

1. It should just redefine HYPOT(x,y) as pg_hypot(x,y), rather than
touching all the call sites --- not to mention possibly breaking
third-party code depending on the HYPOT macro. (That possibility
also leads me to think that the function shouldn't be static, but
should be exported in geo_decls.h.)

2. The comments in the new function leave something to be desired, eg
the discussion of the zero case is as clear as mud IMO.

BTW, the function comment claims that SUS requires a NAN result for
hypot(NAN,INF), but I don't believe it. If it did say that it would
be contrary to ISO C:

-- hypot(x, y) returns +INF if x is infinite, even if y is a
NaN.

Anyway, if you read SUS' HUGE_VAL as meaning INF, that clause comes
before the one about NAN so I think it's saying the same thing as the
other standards.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Conway 2010-07-17 19:22:01 Re: Review: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle
Previous Message Kevin Grittner 2010-07-17 19:09:06 Re: Review: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle