Re: Incorrect behaviour when using a GiST index on points

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect behaviour when using a GiST index on points
Date: 2012-02-20 17:34:21
Message-ID: CAPpHfds-sGWBaRaaPB7HpjUYTV0TgkN2i+3O0oe8QqyWSR1CQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
> > <aekorotkov(at)gmail(dot)com>wrote:
> >> Described differences leads to incorrect behaviour of GiST index.
> >> The question is: what is correct way to fix it? Should on_pb also use
> FP*
> >> or consistent method should behave like on_pb?
>
> > Any comments on this? Current behaviour definitely indicates a bug, and
> I'm
> > ready to fix it. The only question: is this bug in on_pb or gist?
>
> I'm inclined to think the right answer is to make on_pb use the FP*
> macros, for consistency with other geometric operators. But it's worth
> asking whether that will actually fix the problem. I've thought for
> some time that we'd eventually find cases where geo_ops' use of fuzzy
> comparisons breaks index behavior entirely, because it destroys natural
> assumptions like the transitive law. So that could eventually lead us
> to rip out the FP* macros everywhere.
>
> In any case, this doesn't seem like something we could back-patch;
> it'd be a behavioral change in HEAD only.
>

Analyzing GiST interface methods more detail I found one other place where
fuzzy comparison was used. It is gist_box_same function. And it's really
scary thing. It means that entry of internal page is not extended when
inserting index tuple extends it in less than EPSILON. Correspondingly
current GiST search behaviour is neither exact search or fuzzy search with
given EPSILON. It is something in the middle. See following example for
proof.

test=# create table test(a box);
CREATE TABLE
test=# insert into test (select (case when i%2= 0 then
box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from
generate_series(1,300) i);
INSERT 0 300
test=# create index test_idx on test using gist(a);
CREATE INDEX
test=#
test=# insert into test values (box(point(1.0000009,1.0000009),
point(1.0000009,1.0000009)));
INSERT 0 1
test=# select * from test where a && box(point(1.0000018,1.0000018),
point(1.0000018,1.0000018));
a
---------------------------------------------
(1.0000009,1.0000009),(1.0000009,1.0000009)
(1 row)

test=# set enable_seqscan = off;
SET
test=# select * from test where a && box(point(1.0000018,1.0000018),
point(1.0000018,1.0000018));
a
---
(0 rows)

But, I believe we still can bachpatch it in one of following ways:
1) Fix consistent and same functions. Add to release note notice that users
should rebuild indexes if they want correct behaviour.
2) Leave same function as is. Add kluges to consistent functions which
provides correct search on current not truly correct trees.

But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I
really dislike the way FP* are currently implemented. Why EPSILON
is 1.0E-06? We don't know anything about nature of data that users store in
geometrical datatypes. The lesser double precision value is 5E-324. For
some datasets EPSILON can easily cover whole range.

------
With bestregards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-02-20 17:53:55 Re: 16-bit page checksums for 9.2
Previous Message Kevin Grittner 2012-02-20 15:58:23 Re: pgsql_fdw, FDW for PostgreSQL server