Re: Strange behavior with polygon and NaN

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: gkokolatos(at)pm(dot)me, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior with polygon and NaN
Date: 2020-11-24 04:55:44
Message-ID: 20201124.135544.121959508317571971.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> I spent some more time looking at this patch.
>
> Some experimentation shows that the changes around bounding box
> calculation (ie float8_min_nan() and its call sites) seem to be
> completely pointless: removing them doesn't change any of the regression
> results. Nor does using float8_min_nan() in the other two bounding-box
> calculations I'd asked about. So I think we should just drop that set
> of changes and stick with the rule that bounding box upper and lower
> values are sorted as per float.h comparison rules. This isn't that hard
> to work with: if you want to know whether any NaNs are in the box, test
> the upper limit (and *not* the lower limit) for isnan(). Moreover, even
> if we wanted a different coding rule, we really can't have it because we
> will still need to work with existing on-disk values that have bounding
> boxes computed the old way.

Actually that changes the result since that code gives a shortcut of
checking NaNs in the object coordinates. I don't think that the it is
pointless to avoid full calculations that are performed only to find
NaNs are involved, if bounding box check is meaningful.

> I don't much like anything about float8_coef_mul(). In the first place,
> FPzero() is an ugly, badly designed condition that we should be trying
> to get rid of not add more dependencies on. In the second place, it's
> really unclear why allowing 0 times Inf to be something other than NaN
> is a good idea, and it's even less clear why allowing small-but-not-zero
> times Inf to be zero rather than Inf is a good idea. In the third
> place, the asymmetry between the two inputs looks more like a bug than
> something we should actually want.

I have the same feeling on the function, but I concluded that
coefficients and coordinates should be regarded as different things in
the practical standpoint.

For example, consider Ax + By + C == 0, if B is 0.0, we can remove the
second term from the equation, regardless of the value of y, of course
even if it were inf. that is, The function imitates that kind of
removals.

> After some time spent staring at the specific case of line_closept_point
> and its callers, I decided that the real problems there are twofold.
> First, the API, or at least the callers' interpretation of this
> undocumented point, is that whether the distance is undefined (NaN) is
> equivalent to whether the closest point is undefined. This is not so;
> in some cases we know that the distance is infinite even though we can't
> calculate a unique closest point. Second, it's not making any attempt
> to eliminate undefined cases up front. We can do that pretty easily
> by plugging the point's coordinates into the line's equation Ax+By+C
> and seeing whether we get a NaN. The attached 0002 is a subset patch
> that just fixes these two issues, and I like the results it produces.

Actually the code reacts to some "problem" cases in a "wrong" way:

+ * If it is unclear whether the point is on the line or not, then the
+ * results are ill-defined. This eliminates cases where any of the given
+ * coordinates are NaN, as well as cases where infinite coordinates give
+ * rise to Inf - Inf, 0 * Inf, etc.
+ */
+ if (unlikely(isnan(float8_pl(float8_pl(float8_mul(line->A, point->x),
+ float8_mul(line->B, point->y)),
+ line->C))))

| postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}');
| ?column?
| ----------
| NaN

Aren't our guts telling that is 1e+300? You might be thinking to put
some special case handling into that path (as mentioned below?), but
otherwise it yeildsa "wrong" result. The reason for the expectation
is that we assume that "completely vertical" lines have a constant x
value regardless of the y coordinate. That is the reason for the
float8_coef_mul() function.

> I wonder now whether the problems around line_interpt_line() and the
> other intersection-ish functions wouldn't be better handled in a similar
> way, by making changes to their API specs to be clearer about what
> happens with NaNs and trying to eliminate ill-defined cases explicitly.
> I've not tried to code that though.

One of the "ill-defined" cases is the zero-coefficient issue. The
asymmetric multiply function "fixes" it, at least. Of course it could
be open-coded instead of being as a function that looks as if having
some general interpretation.

> Changing pg_hypot() the way you've done here is right out. See the
> comment for the function: what it is doing now is per all the relevant
> standards, and your patch breaks that. It's extremely unlikely that
> doing it differently from IEEE and POSIX is a good idea.

Mmm. Ok, I agree to that.

> Attached are the same old 0001 (adding the extra point_tbl entry)
> and a small 0002 that fixes just line_closept_point. I've not
> tried to figure out just which of the rest of your diffs should be
> dropped given that. I did note though that the example you add
> to func.sgml doesn't apply to this version of line_closept_point:
>
> regression=# select point '(Infinity,Infinity)' <-> line '{-1,0,5}';
> ?column?
> ----------
> NaN
> (1 row)
>
> regression=# select point '(Infinity,Infinity)' <-> line '{0,-1,5}';
> ?column?
> ----------
> NaN
> (1 row)

They root on the same "zero-coefficient issue" with my example shown
above.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2020-11-24 06:04:27 Re: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message tsunakawa.takay@fujitsu.com 2020-11-24 04:27:07 RE: [POC] Fast COPY FROM command for the table with foreign partitions