Re: BUG #6399: knngist sometimes returns tuples in incorrect order

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, oleg(at)sai(dot)msu(dot)su, teodor(at)sigaev(dot)ru
Subject: Re: BUG #6399: knngist sometimes returns tuples in incorrect order
Date: 2012-01-19 08:54:42
Message-ID: 4F17DA52.9080803@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 19.01.2012 06:35, YAMAMOTO Takashi wrote:
>> On 18.01.2012 14:07, Heikki Linnakangas wrote:
>>> For 9.2, I think we should change gist so
>>> that the user-defined distance function can return any scalar data type,
>>> not just float8 (as long as it has b-tree comparison operators).
>>
>> I took a shot at doing that. Patch attached. It needs some cleanup; I
>> think we'll need to bump the version of the btree_gist extension, and I
>> only changed the btree_int8 distance function to do that, even though it
>> would now make a lot of sense to adjust the others, too. Also, I think
>> it'll leak memory (or crash..) if the distance data type is pass-by-ref.
>>
>> If someone has a test suite at hand for performance testing this, that
>> would be good. Now that I'm calling the b-tree comparison function for
>> every comparison operation in the red-black tree, instead of a direct
>> float8<= instruction, this could be quite a bit slower. That could be
>> mitigated somewhat by using the sort-support stuff Tom committed recently.
>>
>> If we bend things a bit, this might be back-patchable to 9.1. We can't
>> add a new am support function easily, that would require a catalog
>> update to increment pg_am.amsupport entry, but we could hardwire the
>> support for int8 distances, like I hardwired the knowledge of float8's
>> comparsion function in the attached patch.
>
> thanks for taking a look quickly.
>
> i have some questions:
>
> doesn't gbt_int8_distance overflow?

Yes, good point.

> probably result += INT64_MIN so that it fits to int8 keeping the order?

It can still overflow. The maximum distance between two int64s is (2^63
- 1) - (-2^63) = 2^64 - 1. That's the distance between the smallest
possible int64 (-2^63) and the greatest one (2^63 - 1)

. The int8_dist function, which implements the <-> operator, can also
overflow, but it checks for it and throws an error:

postgres=# SELECT 9223372036854775807 <-> -9223372036854775808;
ERROR: bigint out of range

We could add such an overflow check to gbt_int8_distance(), but I wonder
if it would have some surprising effects. GiST might calculate distance
to a value that's not in the final result set, so you'd get an error
that you wouldn't get without the index. Then again, the alternative
plan is to fetch the matching rows and sort, and the distance calculated
for the sort will also overflow.

Another solution for gbt_int8_distance() would be to treat the distance
as an unsigned value, changing the comparison operation accordingly.

I see that the distance functions for int4 and int2 can also overflow.
The gist support function doesn't have the same problem with float8
representation, but I think it would be good idea to change int4_dist to
return a bigint, so that it couldn't overflow (and int2_dist to return
an int4).

> isn't strategy number necessary to find out the sorting semantics for
> the operator's sortfamily? (your patch doesn't change it, but...)

Sorry, I didn't understand that. Can you elaborate?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2012-01-19 10:25:36 Re: BUG #6401: IS DISTINCT FROM improperly compares geomoetric datatypes
Previous Message srinivas_j2ee 2012-01-19 05:33:31 BUG #6402: Installer hung and wouldn't work again