Re: [HACKERS] [PATCH] Improve geometric types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, robertmhaas(at)gmail(dot)com, a(dot)alekseev(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [PATCH] Improve geometric types
Date: 2018-01-16 12:23:01
Message-ID: 20180116.212301.205169261.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I'm still wandering on the way and confused. Sorry for
inconsistent comments in advanceX-(

At Sun, 14 Jan 2018 13:20:57 +0100, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote in <CAE2gYzyY3U4n1Zn48qG6dL=FWgv29yag5PdGV2Gc17w9Toeaog(at)mail(dot)gmail(dot)com>
> > I found seemingly inconsistent handling of NaN.
> >
> > - Old box_same assumed NaN != NaN, but new assumes NaN ==
> > NaN. I'm not sure how the difference is significant out of the
> > context of sorting, though...
>
> There is no box != box operator. If there was one, previous code
> would return false for every input including NaN, because it was
> ignorant about NaNs other than a few bandaids to avoid crashes.
>
> My idea is to make the equality (same) operators behave like the float
> datatypes treating NaNs as equals. The float datatypes also treat
> NaNs as greater than any non-NAN. We don't need to worry about this
> part now, because there are no basic comparison operators defined for
> the geometric types.

I'm not sure what you mean by the "basic comparison ops" but I'm
fine with the policy, handling each component values in the same
way with float. So point_eq_point() is right and other functions
should follow the same policy.

> > - box_overlap()'s behavior has not been changed. As the result
> > box_same and box_overlap now behaves differently concerning
> > NaN handling.
>
> After your previous email, I though this would be the right thing to
> do. I made the containment and placement operators return false for
> NaN input unless we can find the result using non-NaN coordinates. Do
> you find this behaviour reasonable?

Sorry for going back and force, but I don't see a difference
between it and the original behavior from the point of view of
reasonability. Isn't is enough to let each component comparison
follow float by redefining FPxx() functions?

For example, define FPle as the follwoing:

static inline bool FP8le(float8 a, float8 b)
{
return float8_le(a, float8_pl(b, EPSILON));
}

Then define box_ov using this function:

static bool box_ov(BOX *box1, BOX *box2)
{
return (FP8le(box1->low.x, box2->high.x) &&....);
}

Another example, define FPeq as:

static inline bool FP8eq(float8 a, float8 b)
{
if (isnan(a))
{
if (isnan(b))
return true;
return false;
}
else if (isnan(b))
return false;

if (isinf(a) && isinf(b))
return true;

return abs(a - b) <= EPSILON;
}

Then redefine point_eq_point as:

statinc inline bool
point_eq_point(Point *pt1, Point *pt2)
{
return FP8eq(pt1->x, pt2->x) && FP8eq(pt1->y, pt2->y);
}

...

At least this is in a kind of consistency and the behavior is
inferable (in a certain sense).

> > - line_construct_pts has been changed so that generates
> > {NaN,-1,NaN} for two identical points (old code generated
> > {-1,0,0}) but I think the right behavior here is erroring out.
> > (as line_in behaves.)
>
> I agree. We should also check for the endpoints being the same on
> lseg_interpt functions. I will make the changes on the next version.
>
> > I feel that it'd better to define how to handle non-numbers
> > before refactoring. I prefer to just denying NaN as a part of
> > the geometric types, or any input containing NaN just yields a
> > result filled with NaNs.
>
> It would be nice to deny NaNs altogether, but I don't think it is
> feasible. People cannot restore their existing data if we start doing
> that. It would also require us to check any result we emit being NaN
> and error out in more cases.

It sounds right.

> > The next point is reasonability of the algorithm and consistency
> > among related functions.
> >
> > - line_parallel considers the EPSILON(ugaa) with different scale
> > from the old code, but both don't seem reasonable.. It might
> > not be the time to touch it, but if we changed the behavior,
> > I'd like to choose reasonable one. At least the tolerance of
> > parallelity should be taken by rotation, not by slope.
>
> I think you are right. Vector based algorithms would be good for many
> other operations as well. This would be more changes, though. I am
> try to reduce the amount of changes to increase my chances to get this
> committed. I just used slope() there to increase the code reuse and
> to make the operation symmetrical. I can revert it back to its
> previous state, if you thing it is better.

I'm fine with both using slope with the same scale (that is,
keeping the previous behavior) or going into vector based. But
the latter raises a problem as below:(

> > So we might should calculate the distance in different (not
> > purely mathematical/geometrical) way. For example, if we can
> > assume the geometric objects are placed mainly around the origin,
> > we could take the distance between the points nearest to origin
> > on both lines. (In other words, between the foots of
> > perpendicular lines from the origin onto the both lines). Of
> > couse this is not ideal but...
>
> The EPSILON is unfair to coordinates closer to the origin. I am not
> sure what it the best way to improve this. I am trying to avoid
> dealing with EPSILON related issues in the scope of these changes.

Right. And I don't have a good idea here..

> > At last, just a simple comment.
> >
> > - point_eq_point() assumes that NaN == NaN. This is an inherited
> > behavior from old float[48]_cmp_internal() but it's not a
> > common treat. point_eq_point() needs any comment about the
> > special definition as float[48]_cmp_internal has.
>
> I hope I answered this on the previous questions. I will incorporate
> the decision to threat NaNs as equals into the comments and the commit
> messages on the next version, if we agree on it.

Perhas it's fine for me.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-16 12:50:03 Re: [HACKERS] Deadlock in XLogInsert at AIX
Previous Message Geoff Winkless 2018-01-16 11:44:37 Re: proposal: alternative psql commands quit and exit