Re: Floating point comparison inconsistencies of the geometric types

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Floating point comparison inconsistencies of the geometric types
Date: 2016-11-13 21:41:01
Message-ID: CAE2gYzymeQXGGmhU1Vc35DpugwfRd-QRK3BM-6TGg0rwHcDN_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.

It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right. Instead, I re-implemented some
operators to keep precision as much as possible. The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison. The new implementation would return true, when
precision is not lost. I think this is a behaviour people, who are
working with floating points, are prepared to deal with. By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].

> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.

I agree that it would be simpler to use the comparison function for
implementing other operators. I have done it other way around to make
them more optimised. They are called very often. I don't think
checking exit code of the comparison function would be optimised the
same way. I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.

> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR: value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR: value out of range: underflow

This is the current behaviour of float datatype. My patch doesn't
change that. This problem would probably also apply to multiplying
very small values. I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.

I think there is value in making geometric types compatible with the
float. Users are going to mix them, anyway. For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.

> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.

I am disinclined to try to implement something complicated for the
geometric types. I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities. The inconsistencies harm
both of those.

[1] https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-13 21:46:05 Re: Improving RLS planning
Previous Message Guillaume Lelarge 2016-11-13 21:23:41 Re: Exclude pg_largeobject form pg_dump