Re: Floating point comparison inconsistencies of the geometric types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: kgrittn(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, andreas(at)proxel(dot)se, teodor(at)sigaev(dot)ru, robertmhaas(at)gmail(dot)com, kgrittn(at)ymail(dot)com, Jim(dot)Nasby(at)bluetreble(dot)com, mail(at)joeconway(dot)com
Subject: Re: Floating point comparison inconsistencies of the geometric types
Date: 2016-11-11 07:39:47
Message-ID: 20161111.163947.218430692.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> Aside from that, I'd like to comment this patch on other points
> later.

The start of this patch is that the fact that most of but not all
geometric operators use fuzzy comparson. But Tom pointed out that
the fixed fuzz factor is not reasonable but hard to find how to
modify it.

We can remove the fuzz factor altogether but I think we also
should provide a means usable to do similar things. At least "is
a point on a line" might be useless for most cases without any
fuzzing feature. (Nevertheless, it is a problem only when it is
being used to do that:) If we don't find reasonable policy on
fuzzing operations, it would be the proof that we shouldn't
change the behavior.

The 0001 patch adds many FP comparison functions individually
considering NaN. As the result the sort order logic involving NaN
is scattered around into the functions, then, you implement
generic comparison function using them. It seems inside-out to
me. Defining ordering at one place, then comparison using it
seems to be reasonable.

The 0002 patch repalces many native operators for floating point
numbers by functions having sanity check. This seems to be
addressing to Tom's comment. It is reasonable for comparison
operators but I don't think replacing all arithmetics is so. For
example, float8_div checks that

- If both of operands are not INF, result should not be INF.
- If the devident is not exactly zero, the result should not be zero.

The second assumption is wrong by itself. For an extreme case,
4.9e-324 / 1.7e+308 becomes exactly zero (by underflow). We
canont assert it to be wrong but the devedent is not zero. The
validity of the result varies according to its meaning. For the
case of box_cn,

> center->x = float8_div(float8_pl(box->high.x, box->low.x), 2.0);
> center->y = float8_div(float8_pl(box->high.y, box->low.y), 2.0);

If the center somehow goes extremely near to the origin, it could
result in a false error.

> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
> ERROR: value out of range: underflow

I don't think this underflow is an error, and actually it is a
change of the current behavior without a reasonable reason. More
significant (and maybe unacceptable) side-effect is that it
changes the behavior of ordinary operators. I don't think this is
acceptable. More consideration is needed.

> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
> ERROR: value out of range: underflow

In regard to fuzzy operations, libgeos seems to have several
types of this kind of feature. (I haven't looked closer into
them). Other than reducing precision seems overkill or
unappliable for PostgreSQL bulitins. As Jim said, can we replace
the fixed scale fuzz factor by precision reduction? Maybe, with a
GUC variable (I hear someone's roaring..) to specify the amount
defaults to fit the current assumption.

https://apt-browse.org/browse/ubuntu/trusty/universe/i386/libgeos++-dev/3.4.2-4ubuntu1/file/usr/include/geos/geom/BinaryOp.h

/*
* Define this to use PrecisionReduction policy
* in an attempt at by-passing binary operation
* robustness problems (handles TopologyExceptions)
*/
...
* Define this to use TopologyPreserving simplification policy
* in an attempt at by-passing binary operation
* robustness problems (handles TopologyExceptions)
(seems not activated)
...
* Use common bits removal policy.
* If enabled, this would be tried /before/
* Geometry snapping.
...
/*
* Use snapping policy
*/

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-11-11 07:42:43 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Previous Message Corey Huinker 2016-11-11 07:17:32 Re: Do we need use more meaningful variables to replace 0 in catalog head files?