Re: [HACKERS] [PATCH] Improve geometric types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, robertmhaas(at)gmail(dot)com, a(dot)alekseev(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [PATCH] Improve geometric types
Date: 2018-01-12 08:16:56
Message-ID: 20180112.171656.119920237.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, sorry for the absense.

(Sorry for the long but should be hard-to-read article..)

At Thu, 4 Jan 2018 14:53:47 +0100, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote in <CAE2gYzz0hRfC-M8KDV4Aytj+6k6cvoiGqVvEjVyHTbtraf=YBQ(at)mail(dot)gmail(dot)com>
> Rebased versions are attached.

Thank you for the new patch.

I'm looking 0001 patch. This does many things at once but seems
hard to split into step-by-step patches. So I reviewed this from
the viewpoint that how each of the external function is
changed. (Attatchment 1).

I think that this patch is not intending to change any behaviors
of the external functions, but intending make it mathematically
reasonable. So some behavioral changes are ineviablly
requried. The problem here is what is the most reasonable (and
acceptable) behavior.

The comments below are just the starting of a discussion about
some aspects of design. I'm sorry that this discussion might be
changing the way to go largily, but I'd like to hear what you
think about two topics.

I found seemingly inconsistent handling of NaN.

- Old box_same assumed NaN != NaN, but new assumes NaN ==
NaN. I'm not sure how the difference is significant out of the
context of sorting, though...

- box_overlap()'s behavior has not been changed. As the result
box_same and box_overlap now behaves differently concerning
NaN handling.

- line_construct_pts has been changed so that generates
{NaN,-1,NaN} for two identical points (old code generated
{-1,0,0}) but I think the right behavior here is erroring out.
(as line_in behaves.)

I feel that it'd better to define how to handle non-numbers
before refactoring. I prefer to just denying NaN as a part of
the geometric types, or any input containing NaN just yields a
result filled with NaNs.

The next point is reasonability of the algorithm and consistency
among related functions.

- line_parallel considers the EPSILON(ugaa) with different scale
from the old code, but both don't seem reasonable.. It might
not be the time to touch it, but if we changed the behavior,
I'd like to choose reasonable one. At least the tolerance of
parallelity should be taken by rotation, not by slope.

I think that one reasonable way to do this is taking the distance
between the far ends of two direction (unit) vectors.

Assuming Ax + By + C = 0, the direction vecotr dv(x, y) for the
line is:

n = sqrt(A^2 + B^2), dv = (B / n, -A / n).

(and the normal vector nv = (A / n, B / n))

The vector needs to be restricted within upper or any two
successive quadrants so that it is usable for this objective. So
let's restrict A to be negative value for now. Something like the
follwoing would be an answer.

void line_direction_vector(Point *result, LINE *l)
{
float8 A = l->A;
float8 B = l->B;
float8 n;

n = HYPOT(A, B);

/* keep the result vector within the 1st-2nd quadrants */
if (A > 0)
{
A = -A;
B = -B;
}
point_construct(result, B / n, -A / n);
}

bool line_parallel(LINE *l1, LINE *l2)
{
Point d1, d2;

line_direction_vector(&d1, l1);
line_direction_vector(&d2, l2);
return FPzero(point_dt(&d1, &d2));
}

Also perpendicularity is detected by comparing the direction
vector of one line and the normal vector of another in the same
manner.

void line_normal_vector(Point *result, LINE *l)
{
float8 A = l->A;
float8 B = l->B;
float8 n;

/* keep the result vector within the 1st-2nd quadrants */
n = HYPOT(A, B);
if (B < 0)
{
A = -A;
B = -B;
}
point_construct(result, A / n, B / n);
}

bool line_perp(LINE *l1, LINE *l2)
{
Point d1, d2;

line_direction_vector(&d1, l1);
line_normal_vector(&d2, l2);
return FPzero(point_dt(&d1, &d2));
}

In relation to this, equality of two lines is also nuisance.
From the definitions, two equal lines are parallel. In
contraposition, two nonparallel lines cannot be equal. This
relationship is represented by the following expression:

is_equal =: line_parallel(l1, l2) && FPzero(line_distance(l1, l2))

But the line_distance returns zero in most cases even if it is
considered "parallel" with considering tolerance. This means that
There's almost no two lines that are parallel but not equal (that
is, the truely parallel lines)... sigh.

So we might should calculate the distance in different (not
purely mathematical/geometrical) way. For example, if we can
assume the geometric objects are placed mainly around the origin,
we could take the distance between the points nearest to origin
on both lines. (In other words, between the foots of
perpendicular lines from the origin onto the both lines). Of
couse this is not ideal but...

The same discussion also affects line_interpt_line or other
similar functions.

At last, just a simple comment.

- point_eq_point() assumes that NaN == NaN. This is an inherited
behavior from old float[48]_cmp_internal() but it's not a
common treat. point_eq_point() needs any comment about the
special definition as float[48]_cmp_internal has.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2018-01-12 08:37:12 Re: master make check fails on Solaris 10
Previous Message Fabien COELHO 2018-01-12 07:53:33 Re: bytea bitwise logical operations implementation (xor / and / or / not)