Re: Strange behavior with polygon and NaN

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: gkokolatos(at)pm(dot)me
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, bruce(at)momjian(dot)us, sbjesse(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior with polygon and NaN
Date: 2020-09-10 09:37:09
Message-ID: 20200910.183709.941009591692803587.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Georgios.

At Mon, 07 Sep 2020 12:46:50 +0000, gkokolatos(at)pm(dot)me wrote in
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane tgl(at)sss(dot)pgh(dot)pa(dot)us wrote in
> >
> > > Kyotaro Horiguchi horikyota(dot)ntt(at)gmail(dot)com writes:
> > >
> > > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian bruce(at)momjian(dot)us wrote in
> > > >
> > > > > I can confirm that this two-month old email report still produces
> > > > > different results with indexes on/off in git master, which I don't think
> > > > > is ever correct behavior.
> > >
> > > > I agree to that the behavior is broken.
> > >
> > > Yeah, but ... what is "non broken" in this case? I'm not convinced
> > > that having point_inside() return zero for any case involving NaN
> > > is going to lead to noticeably saner behavior than today.
> >
> > Yes, just doing that leaves many unfixed behavior come from NaNs, but
> > at least it seems to me one of sane definition candidates that a point
> > cannot be inside a polygon when NaN is involved. It's similar to
> > Fpxx() returns false if NaN is involved. As mentioned, I had't fully
> > checked and haven't considered this seriously, but I changed my mind
> > to check all the callers. I started checking all the callers of
> > point_inside, then finally I had to check all functions in geo_ops.c:(
> >
>
> For what is worth, I agree with this definition.

Thanks.

> > The attached is the result as of now.
> >
> > === Resulting behavior
> >
> > With the attached:
> >
> > - All boolean functions return false if NaN is involved.
> > - All float8 functions return NaN if NaN is involved.
> > - All geometric arithmetics return NaN as output if NaN is involved.
>
> Agreed! As in both this behaviour conforms to the definition above and the patch provides this behaviour with the exceptions below.
>
> >
> > With some exceptions:
> >
> > - line_eq: needs to consider that NaNs are equal each other.
> > - point_eq/ne (point_eq_pint): ditto
> > - lseg_eq/ne: ditto
...
> > === pg_hypot mistake?
> >
> > I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
> > I think NaN is appropriate here since other operators behaves that
> > way. This change causes a change of distance between point(1e+300,Inf)
> > and line (1,-1,0) from infinity to NaN, which I think is correct
> > because the arithmetic generates NaN as an intermediate value.
> >
> > === Infinity annoyances
> >
> > Infinity makes some not-great changes in regresssion results. For example:
> >
> > - point '(1e+300,Infinity)' <-> path '((10,20))' returns
> > NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
> > '[(1,2),(3,4)]' returns Infinity. The difference of the two
> > expressions is whether (0 * Inf = NaN) is performed or not. The
> > former performs it but that was concealed by not propagating NaN to
> > upper layer without the patch.
>
> Although I understand the reasoning for this change. I am not certain I agree with the result. I feel that:
> point '(1e+300,Infinity)' <-> path '((10,20))'
> should return Infinity. Even if I am wrong to think that, the two results should be expected to behave the same. Am I wrong on that too?

No. Actually that's not correct and that just comes from avoiding
special code paths for Infinity. I put more thought on
line_interpt_line and found that that issue is "fixed" by just
simplifying formulas by removing invariants. But one more if-block is
needed to make the function work a symmetrical way, though..

However, still we have a similar issue.

point '(Infinity,1e+300)' <-> line '{-1,0,5}' => Infinity
point '(Infinity,1e+300)' <-> line '{0,-1,5}' => NaN
point '(Infinity,1e+300)' <-> line '{1,1,5}' => NaN

The second should be 1e+300 and the third infinity. This is because
line_closept_point taking the distance between the foot of the
perpendicular line from the point and the point. We can fix the second
case by adding special code paths for vertical and horizontal lines,
but the third needs another special code path explicitly identifying
Infinity. It seems a kind of too-much..

Finally, I gave up fixing that and added doucmentation.

As another issue, (point '(Infinity, 1e+300)' <-> path '((10,20))')
results in NaN. That is "fixed" by adding a special path for "m ==
0.0" case, but I'm not sure it's worth doing..

By the way I found that float8_div(<normal number>, infinity) erros
out as underflow. It is inconsistent with the behavior that float8_mul
doesn't error-out as overflow when Infinity is given. So fixed it.

> > - This is not a difference caused by this patch, but for both patched
> > and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
> > which should be 1e+300. However, the behavior comes from arithmetic
> > reasons and wouldn't be a problem.
> >
> > create_index.out is changed since point(NaN,NaN) <@ polygon changed
> > from true to false, which seems rather saner.
> >
> > I haven't checked unchanged results but at least changed results seems
> > saner to me.
>
> All in all a great patch!
>
> It is clean, well reasoned and carefully crafted.
>
> Do you think that the documentation needs to get updated to the 'new' behaviour?

Hmm. I'm not sure we can guarantee the behavior as documented, but I
tried writing in functions-geometry.html.

> NaN and Infinity make geometric functions and operators behave
> inconsistently. Geometric operators or functions that return a boolean
> return false for operands that contain NaNs. Number-returning
> functions and operators return the NaN in most cases but sometimes
> return a valid value if no NaNs are met while
> calculation. Object-returning ones yield an object that contain NaNs
> depending to the operation. Likewise the objects containing Infinity
> can make geometric operators and functions behave inconsistently. For
> example (point '(Infinity,Infinity)' <-> line '{-1,0,5}') is Infinity
> but (point '(Infinity,Infinity)' <-> line '{0,-1,5}') is NaN. It can
> never be a value other than these, but you should consider it
> uncertain how geometric operators behave for objects containing
> Infinity.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v2-0001-Fix-NaN-handling-of-some-geometric-operators-and-.patch text/x-patch 108.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-09-10 09:51:30 Re: PATCH: Attempt to make dbsize a bit more consistent
Previous Message Kasahara Tatsuhito 2020-09-10 09:29:09 Re: autovac issue with large number of tables