Re: Strange behavior with polygon and NaN

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: bruce(at)momjian(dot)us, sbjesse(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior with polygon and NaN
Date: 2020-08-27 11:24:12
Message-ID: 20200827.202412.479356179938754961.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian <bruce(at)momjian(dot)us> wrote in
> >> I can confirm that this two-month old email report still produces
> >> different results with indexes on/off in git master, which I don't think
> >> is ever correct behavior.
>
> > I agree to that the behavior is broken.
>
> Yeah, but ... what is "non broken" in this case? I'm not convinced
> that having point_inside() return zero for any case involving NaN
> is going to lead to noticeably saner behavior than today.

Yes, just doing that leaves many unfixed behavior come from NaNs, but
at least it seems to me one of sane definition candidates that a point
cannot be inside a polygon when NaN is involved. It's similar to
Fpxx() returns false if NaN is involved. As mentioned, I had't fully
checked and haven't considered this seriously, but I changed my mind
to check all the callers. I started checking all the callers of
point_inside, then finally I had to check all functions in geo_ops.c:(

The attached is the result as of now.

=== Resulting behavior

With the attached:

- All boolean functions return false if NaN is involved.
- All float8 functions return NaN if NaN is involved.
- All geometric arithmetics return NaN as output if NaN is involved.

With some exceptions:
- line_eq: needs to consider that NaNs are equal each other.
- point_eq/ne (point_eq_pint): ditto
- lseg_eq/ne: ditto

The change makes some difference in the regression test.
For example,

<obj containing NaN> <-> <any obj> changed from 0 to NaN. (distance)
<obj containing NaN> <@ <any obj> changed from true to false. (contained)
<obj containing NaN> <-> <any obj> changed from 0 to NaN. (distance)
<obj containing NaN> ?# <any obj> changed from true to false (overlaps)

=== pg_hypot mistake?

I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
I think NaN is appropriate here since other operators behaves that
way. This change causes a change of distance between point(1e+300,Inf)
and line (1,-1,0) from infinity to NaN, which I think is correct
because the arithmetic generates NaN as an intermediate value.

=== Infinity annoyances

Infinity makes some not-great changes in regresssion results. For example:

- point '(1e+300,Infinity)' <-> path '((10,20))' returns
NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
'[(1,2),(3,4)]' returns Infinity. The difference of the two
expressions is whether (0 * Inf = NaN) is performed or not. The
former performs it but that was concealed by not propagating NaN to
upper layer without the patch.

- Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)'
generates '(0,2)', which is utterly wrong. It is because
box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets
the wrong point for distance=NaN is set. With the patch, the NaN
makes the result NULL.

- This is not a difference caused by this patch, but for both patched
and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
which should be 1e+300. However, the behavior comes from arithmetic
reasons and wouldn't be a problem.

create_index.out is changed since point(NaN,NaN) <@ polygon changed
from true to false, which seems rather saner.

I haven't checked unchanged results but at least changed results seems
saner to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v1-0001-Fix-NaN-handling-of-some-geometric-operators-and-.patch text/x-patch 43.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-08-27 11:26:45 Re: Parallel copy
Previous Message John Naylor 2020-08-27 11:11:51 Re: factorial function/phase out postfix operators?