Re: Strange behavior with polygon and NaN

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: gkokolatos(at)pm(dot)me, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior with polygon and NaN
Date: 2020-11-16 06:16:36
Message-ID: 20201116.151636.571392156451834232.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 13 Nov 2020 11:26:21 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> >> For instance, {1,-1,0} is the line "x = y". We could argue about
> >> whether it'd be sensible to return zero for the distance between that
> >> and the point (inf,inf), but surely any point with one inf and one
> >> finite coordinate must be an infinite distance away from that line.
> >> There's nothing ill-defined about that situation.
>
> > Mmm... (swinging my arms to mimic lines..)
> > dist(x = y, (1e300, Inf)) looks indeterminant to me..
>
> Well, what you're showing is that we get an internal overflow,
> essentially, on the way to calculating the result. Which is true,
> so it's sort of accidental that we got a sensible result before.
> Nonetheless, we *did* get a sensible result, so producing NaN
> instead seems like a regression.

Independently from the discussion, the following was wrong.

> 2. calculate the cross point.
> corsspoint({-1, -1, Inf}, {1,-1,0}) => (Inf, NaN)

The Corss point must be on the line 2, that is, x equas to y. If we
avoid using x to calcualte y, the result gets right. But that doesn't
"fix" the result.

> We might need to introduce special-case handling to protect the
> low-level calculations from ever seeing NaN or Inf in their inputs.
> Getting the right answer to "just fall out" of those calculations
> might be an unreasonable hope.

However, as far as we we calculate the distance between the point and
the foot of the perpendicular line from the point to the line, (inf -
inf) is inevitable and we cannot avoid that "wrong" result.

> For example, for a line with positive slope (A and B of opposite
> signs), I think that the right answer for points (Inf,Inf) and
> (-Inf,-Inf) should be NaN, on much the same grounds that Inf
> minus Inf is NaN not zero. But all other points involving any Inf
> coordinates are clearly an infinite distance away from that line.

After some checking I noticed that the calculation with the well-known
formula was wrong.

> The formula for the distance((x0,y0) - (ax + by + c = 0)) is
>
> |ax0 + by0 + c|/sqrt(a^2 + b^2)
>
> where a = -1, b = -1, c = Inf, x0 = 1e300, y0 = Inf,

a = -1, b = -1, c = "0", x0=1e300, y0=Inf results in Inf. Sorry for
the mistake.

So, we can recalculate the result using the formula if get NaN based
on the perpendicular foot. The reason I left the existing calculation
is the consistency between the returned perpendicular foot and the
distance value, and for the reduced complexity in the major code path.

1. So the attached yeilds "Inf" in that cases.

2. Independently from the point, I noticed that the y-coord of the
perpendicular foot is miscalculated as NaN instead of Inf for the
cases that are discussed here. (line_interpt_line)

3. I fixed line_construct to construct (NaN, NaN, NaN) if the input
containsNaNs.

4. Renamed the variable "isnan" to "anynan" in lseg_closept_lseg() and
box_closept_point().

5. (not in the past comments) line_interpt() needs to check if any of
the coordinates is NaN since line_interpt_line() is defined to return
such a result.

A. I'm not sure how to treat addtion/subtruct/multiply between
points. But thinking that operations as vector calculation returning
such values are valid. So I left them as it is.

-- Add point
SELECT p1.f1, p2.f1, p1.f1 + p2.f1 FROM POINT_TBL p1, POINT_TBL p2;
(NaN,NaN) | (0,0) | (NaN,NaN)

B. @@ lseg (center) returns NaN-containing results. I'm not sure this
is regarded whether as a vector calculation or as a geometric
operation. If it is the former we don't fix it and otherwise we
should reutrn NULL for such input.

=# select @@ lseg('[(NaN,1),(NaN,90)]');
?column?
------------
(NaN,45.5)
(1 row)

== Changes in the result ============

1 and 2 above cause visible diffence in some results at the least
significant digit in mantissa, but that difference doesn't matter.

> - (-3,4) | {-0.000184615384615,-1,15.3846153846} | 11.3851690368 | 11.3851690368
> + (-3,4) | {-0.000184615384615,-1,15.3846153846} | 11.3851690367 | 11.3851690367

1 restored the previous results.

> - (1e+300,Infinity) | {1,-1,0} | NaN | NaN
> - (1e+300,Infinity) | {-0.4,-1,-6} | NaN | NaN
> - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | NaN | NaN
> + (1e+300,Infinity) | {1,-1,0} | Infinity | Infinity
> + (1e+300,Infinity) | {-0.4,-1,-6} | Infinity | Infinity
> + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | Infinity | Infinity
>
>
> - (Infinity,1e+300) | [(0,-20),(30,-20)] | NaN | NaN
> + (Infinity,1e+300) | [(0,-20),(30,-20)] | Infinity | Infinity
> - (Infinity,1e+300) | [(0,0),(3,0),(4,5),(1,6)] | NaN | NaN
> + (Infinity,1e+300) | [(0,0),(3,0),(4,5),(1,6)] | Infinity | Infinity

Looks fine.

> -- Closest point to line
> SELECT p.f1, l.s, p.f1 ## l.s FROM POINT_TBL p, LINE_TBL l;
> - (1e+300,Infinity) | {1,-1,0} |
> - (1e+300,Infinity) | {-0.4,-1,-6} |
> - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |
> + (1e+300,Infinity) | {1,-1,0} | (Infinity,Infinity)
> + (1e+300,Infinity) | {-0.4,-1,-6} | (-Infinity,Infinity)
> + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | (-Infinity,Infinity)
>
> -- Distance to line segment
> SELECT p.f1, l.s, p.f1 <-> l.s AS dist_ps, l.s <-> p.f1 AS dist_sp FROM POINT_TBL p, LSEG_TBL l;
> - (Infinity,1e+300) | [(0,-20),(30,-20)] |
> + (Infinity,1e+300) | [(0,-20),(30,-20)] | (30,-20)
>
> -- Intersection point with line
> SELECT l1.s, l2.s, l1.s # l2.s FROM LINE_TBL l1, LINE_TBL l2;
> - {-0.000184615384615,-1,15.3846153846} | {0,3,0} | (83333.3333333,-1.7763568394e-15)
> + {-0.000184615384615,-1,15.3846153846} | {0,3,0} | (83333.3333333,0)

These are fixed by 2.

> -- Distance to line
> SELECT p.f1, l.s, p.f1 <-> l.s AS dist_pl, l.s <-> p.f1 AS dist_lp FROM POINT_TBL p, LINE_TBL l;
> (1e+300,Infinity) | {-1,0,3} | NaN | NaN

This should be 1e+300, not NaN, but 1 nor 2 doesn't fix this. The
reasonis line->B(0) * point->y(Infinity) results in NaN. But from the
meaning of the this sexpression, it should be 0.

I made line_closept_point() to do that but I found a similar issue in
line_interpt_line().

> -- Closest point to line
> SELECT p.f1, l.s, p.f1 ## l.s FROM POINT_TBL p, LINE_TBL l;
> (1e+300,Infinity) | {1,0,5} | (NaN,Infinity)

So, what is needed here is we have special multiplication function
that supercedes 0*Inf = NaN rule by "0"*Inf = 0. I introduced that
function as float8_coef_mul(). The reason that the function is in
geo_ops.c is that it is geo_ops specific and using ZPzere(), which is
not used in float.h. By using the function the results are fixed as:

> -- Distance to line
> SELECT p.f1, l.s, p.f1 <-> l.s AS dist_pl, l.s <-> p.f1 AS dist_lp FROM POINT_TBL p, LINE_TBL l;
> (1e+300,Infinity) | {-1,0,3} | 1e+300 | 1e+300
> (Infinity,1e+300) | {0,-1,5} | 1e+300 | 1e+300
>
> -- Closest point to line
> SELECT p.f1, l.s, p.f1 ## l.s FROM POINT_TBL p, LINE_TBL l;
> (1e+300,Infinity) | {1,0,5} | (-5,Infinity)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v4-0001-add-morepoint-tests.patch text/x-patch 66.8 KB
v4-0002-fix-geometric-nan-handling.patch text/x-patch 84.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-11-16 06:47:51 Re: Tracking cluster upgrade and configuration history
Previous Message Alexander Korotkov 2020-11-16 06:12:07 Re: Supporting = operator in gin/gist_trgm_ops