Re: Floating point comparison inconsistencies of the geometric types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: kgrittn(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, andreas(at)proxel(dot)se, teodor(at)sigaev(dot)ru, robertmhaas(at)gmail(dot)com, kgrittn(at)ymail(dot)com, Jim(dot)Nasby(at)bluetreble(dot)com, mail(at)joeconway(dot)com
Subject: Re: Floating point comparison inconsistencies of the geometric types
Date: 2017-01-18 08:36:12
Message-ID: 20170118.173612.13506164.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

(I apologize in advance for possible inaccurate wording on maths,
which might cause confusion..)

At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote in <CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw(at)mail(dot)gmail(dot)com>
> > - Floating point comparisons for gemetric types
> >
> > Comparison related to geometric types is performed by FPeq
> > macro and similars defined in geo_decls.h. This intends to give
> > tolerance to the comparisons.
> >
> > A
> > FPeq: |<=e-|-e=>| (<= means inclusive, e = epsilon = tolerance)
> > FPne: ->| e | e |<- (<- means exclusive)
> > FPlt: | e |<-
> > FPle: |<=e |
> > FPgt: ->| e |
> > FPge: | e=>|
> >
> > These seems reasonable ignoring the tolerance amount issue.
>
> I tried to explain below why I don't find this method reasonable.

Thank you very much for the explanation.

> > At least for the point type, (bitmap) index scan is consistent
> > with sequential scan. I remember that the issue was
> > "inconsistency between indexed and non-indexed scans over
> > geometric types" but they seem consistent with each other.
>
> You can see on the Git history that it was a lot of trouble to make
> the indexes consistent with the operators, and this is limiting
> improving indexing of the geometric types.

Yeah. geo_ops.c has the following comment from its first version
in the current repository.

| * Relational operators for Points.
| * Since we do have a sense of coordinates being
| * "equal" to a given accuracy (point_vert, point_horiz),

So, we take it as a requirement for geometric types. All the
difficulties come from the "nature".

> > You mentioned b-tree, which don't have predefined opclass for
> > geometric types. So the "inconsistency" may be mentioning the
> > difference between geometric values and combinations of plain
> > (first-class) values. But the two are different things and
> > apparently using different operators (~= and = for equality) so
> > the issue is not fit for this. More specifically, "p ~= p0" and
> > "x = x0 and y = y0" are completely different.
>
> Yes, the default btree operator class doesn't have to implement
> standard equality and basic comparison operator symbols. We can
> implement it with different symbols.

Perhaps I don't understand it. Many opclass are defined for
btree. But since (ordinary) btree handles only one-dimentional,
totally-orderable sets, geometric types even point don't fit
it. Even if we relaxed it by removing EPSILON, multidimentional
data still doesn't fit.

Still we can implement equality mechanism *over* multiple btree
indexes, but it cannot be a opclass for btree.

> > Could you let me (or any other guys on this ml) have more precise
> > description on the problem and/or what you want to do with this
> > patch?
>
> I will try to itemize the current problems with the epsilon method:
>
> = Operator Inconsistencies
>
> My current patches address all of them.

No. It just removes tolerans that is a "requirement" for
coordinates of geometric types. I think we don't have enough
reason to remove it. We could newly define geometric types with
no-tolerance as 'exact_point" or something but it still doesn't
fit btree.

> - Only some operators are using the epsilon.
>
> I included the list of the ones which doesn't use the epsilon on
> initial post of this thread. This makes the operators not compatible
> with each other. For example, you cannot say "if box_a @> box_b and
> box_b @> point then box_a @> box_b".

(It must be "then box_a @> point".) Both of the operators don't
seem to use EPSILON so the transitivity holds, but perhaps we
should change them to more appropriate shape, that is, to use
EPSILON. Even having the tolerance, we can define the operators
to keep the transitivity.

> - The application of epsilon is implementation dependent.
>
> Epsilon works between two numbers. There is not always a single way
> of implementing geometric operators. This is surprising to the users.
> For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> epsilon".

Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
..". I'm not sure off-hand. But I don't see the relationship is
significant.

> This is also creating a high barrier on developing those operators.
> Fixing bugs and performance regressions are very likely to change the
> results.

I agree to you. The effect of the EPSILON is quite arbitrary and
difficult to see their chemistry.

> - Some operators are just wrong:
>
> Line operators are not well maintained. The following are giving
> wrong results when neither A or B of lines are not exactly 1:
>
> * line # line
> * line <-> line
> * point ## line
> * point ## lseg

These are not comparison operators so EPSILON is unrelated. And I
agree that they needs fix.

> They are broken independently from the epsilon, though it is not easy
> to fix them without changing the results of the cases that are
> currently working.

Although I'm not sure they are using EPSILON, I think they don't
need it.

> - Some operators are violating commutative property.
>
> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

Whether EPSILON is introduced or not, commutativity cannot be
assured for it from calculation error, I suppose.

> - Some operators are violating transitive property.
>
> For example, you cannot say "if point_a ~= point_b and point_b ~=
> point_c then point_a ~= point_c".

It holds only in ideal (or mathematical) world, or for similars
of integer (which are strictly implemented mathematica world on
computer). Without the EPSILON, two points hardly match by
floating point error, as I mentioned. I don't have an evidence
though (sorry), mere equality among three floating point numbers
is not assured.

Sorry, I have no more time today, I'll continue tomorrow.

> - The operators with epsilon are not compatible with float operators.
>
> This is also surprising for the users. For example, you cannot say
> "if point_a << point_b then point_a[0] < point_b[0]".
>
> - The operators are not checking for underflow or overflow.
> - NaN values are not treated same as the float datatype.
>
> Those are independent problems, though checking them with epsilon
> would create additional set of inconsistencies. Epsilon creates
> especially more problems around 0.
>
> = Missing Bits on the Operator Classes
>
> I want to address those in the future, if my patches are accepted.
>
> - There is no GiST or SP-GiST support for cross datatype operators
> other than containment.
>
> We can develop nice cross-datatype operator families to support more
> operators. It would have been easier, if we could rely on some
> operators to implement others. Geometric types serve the purpose of
> demonstrating our indexing capabilities. We should make them good
> examples, not full of special cases with hidden bugs.
>
> - There is no BRIN support for types other than the box.
>
> BRIN inclusion operator class is datatype agnostic. It uses some
> operators to implement others which doesn't work for anything more the
> box vs box operators, because of the inconsistencies.
>
> - GROUP BY and DISTINCT are not working.
> - ORDER BY is not working.
>
> There are regular user complaints about this on the mailing list. We
> can easily provide default hash and btree operator classes that would
> fix the problem, if we haven't had the epsilon.
>

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2017-01-18 09:54:22 Re: pageinspect: Hash index support
Previous Message Rahila Syed 2017-01-18 07:55:06 Re: Parallel Index Scans