Re: [HACKERS] [PATCH] Improve geometric types

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Improve geometric types
Date: 2018-01-14 12:20:57
Message-ID: CAE2gYzyY3U4n1Zn48qG6dL=FWgv29yag5PdGV2Gc17w9Toeaog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I found seemingly inconsistent handling of NaN.
>
> - Old box_same assumed NaN != NaN, but new assumes NaN ==
> NaN. I'm not sure how the difference is significant out of the
> context of sorting, though...

There is no box != box operator. If there was one, previous code
would return false for every input including NaN, because it was
ignorant about NaNs other than a few bandaids to avoid crashes.

My idea is to make the equality (same) operators behave like the float
datatypes treating NaNs as equals. The float datatypes also treat
NaNs as greater than any non-NAN. We don't need to worry about this
part now, because there are no basic comparison operators defined for
the geometric types.

> - box_overlap()'s behavior has not been changed. As the result
> box_same and box_overlap now behaves differently concerning
> NaN handling.

After your previous email, I though this would be the right thing to
do. I made the containment and placement operators return false for
NaN input unless we can find the result using non-NaN coordinates. Do
you find this behaviour reasonable?

> - line_construct_pts has been changed so that generates
> {NaN,-1,NaN} for two identical points (old code generated
> {-1,0,0}) but I think the right behavior here is erroring out.
> (as line_in behaves.)

I agree. We should also check for the endpoints being the same on
lseg_interpt functions. I will make the changes on the next version.

> I feel that it'd better to define how to handle non-numbers
> before refactoring. I prefer to just denying NaN as a part of
> the geometric types, or any input containing NaN just yields a
> result filled with NaNs.

It would be nice to deny NaNs altogether, but I don't think it is
feasible. People cannot restore their existing data if we start doing
that. It would also require us to check any result we emit being NaN
and error out in more cases.

> The next point is reasonability of the algorithm and consistency
> among related functions.
>
> - line_parallel considers the EPSILON(ugaa) with different scale
> from the old code, but both don't seem reasonable.. It might
> not be the time to touch it, but if we changed the behavior,
> I'd like to choose reasonable one. At least the tolerance of
> parallelity should be taken by rotation, not by slope.

I think you are right. Vector based algorithms would be good for many
other operations as well. This would be more changes, though. I am
try to reduce the amount of changes to increase my chances to get this
committed. I just used slope() there to increase the code reuse and
to make the operation symmetrical. I can revert it back to its
previous state, if you thing it is better.

> So we might should calculate the distance in different (not
> purely mathematical/geometrical) way. For example, if we can
> assume the geometric objects are placed mainly around the origin,
> we could take the distance between the points nearest to origin
> on both lines. (In other words, between the foots of
> perpendicular lines from the origin onto the both lines). Of
> couse this is not ideal but...

The EPSILON is unfair to coordinates closer to the origin. I am not
sure what it the best way to improve this. I am trying to avoid
dealing with EPSILON related issues in the scope of these changes.

> At last, just a simple comment.
>
> - point_eq_point() assumes that NaN == NaN. This is an inherited
> behavior from old float[48]_cmp_internal() but it's not a
> common treat. point_eq_point() needs any comment about the
> special definition as float[48]_cmp_internal has.

I hope I answered this on the previous questions. I will incorporate
the decision to threat NaNs as equals into the comments and the commit
messages on the next version, if we agree on it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-14 12:33:05 Re: [PATCH] Find additional connection service files in pg_service.conf.d directory
Previous Message Greg Stark 2018-01-14 12:15:54 Re: WIP: a way forward on bootstrap data