From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|

To: | William White <bwhite(at)frognet(dot)net> |

Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgreSQL(dot)org |

Subject: | Fixing r-tree semantics |

Date: | 2005-06-23 21:59:25 |

Message-ID: | 8760.1119563965@sss.pgh.pa.us |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

I looked into the r-tree breakage discussed in this thread:

http://archives.postgresql.org/pgsql-general/2004-03/msg01135.php

The executive summary: r-tree index opclasses contain four two-dimensional

operators, which behave correctly, and four one-dimensional operators

which do not. There is a basic logic error in the handling of the 1-D

operators. One could also legitimately ask why the opclasses don't cover

both directions (X and Y) for 1-D inquiries. The same problems exist in

the contrib/rtree_gist implementation, because it just copied the r-tree

logic without inquiring too closely into it :-(

We currently have built-in opclasses for types "box" and "polygon", both

of which are fundamentally 2-D objects. The 2-D operators that the r-tree

opclasses handle are:

~= same (ordinary equality)

&& overlaps

~ contains

@ is contained by

with pretty much the intuitive definitions of these things. The 1-D

operators in the opclasses are

<< left l.max_x < r.min_x

>> right l.min_x > r.max_x

&< overleft l.max_x <= r.max_x

&> overright l.min_x >= r.min_x

(I'm not here to argue about whether these definitions are right in

detail, particularly about the equality cases; that's the way it's been

since Berkeley and I'm not proposing to change them.)

Now the problem is that given a query box, say "index_col << some_box",

the rtree code has to decide whether to descend to a child page of the

index tree based on what is in the parent index page's entry for the

child --- and what is there is the union, or minimum combined bounding

box, of the boxes or polygons in the child. So the test for descending

is not the same as the test for whether a leaf index entry actually

matches the query. Fine ... but somebody got this wrong long ago.

If you think about it, the criterion for descending during a << (left)

search is properly

union_box.min_x < query.min_x

If this is true, there might be boxes within the union that satisfy

the << requirement (box.max_x < query.min_x); if it is not true then

there clearly can be no such boxes. So, given the available operators,

the correct test for descending is "!overright". But the actual test

being done, according to rtstrat.c, is "overleft". This causes the search

to fail to search child pages that should be searched (and probably also

to waste time searching pages that shouldn't be searched). The observed

result is, not surprisingly, that the indexscan fails to find some rows

it should find.

In the same way, the correct descent tests for the other operators are

overleft: !right

right: !overleft

overright: !left

overlaps: overlaps

same: contains

contains: contains

contained: overlaps

rtstrat.c gets the first three of these wrong, but the last four cases

covering the 2-D operators are correct.

This analysis explains why we've not heard more complaints about such a

fundamental breakage: the cases most people care about, when using an

r-tree, are 2-D inquiries. And what's more, the default selectivity

estimates for 1-D inquiries are so low that the index never got used.

In 8.1 this will change, because a bitmap index scan looks attractive

to the planner even at rather low selectivity --- which means that we

are probably going to hear more complaints, if we don't fix it.

Fixing the existing operators seems relatively straightforward; there will

need to be some extension to rtstrat.c to represent "NOT this operator"

as well as "this operator", but that's not hard. I plan to do this, and

make the corresponding fixes in contrib/rtree_gist as well.

What needs more discussion is that it seems to me to make sense to extend

the standard opclasses to handle the four Y-direction operators comparable

to the X-direction operators that are already there, that is "above",

"below", "overabove", and "overbelow". The polygon type has none of these

operators at the moment. Box has

<^ below l.max_y <= r.min_y

>^ above l.min_y >= r.max_y

but not the overlap variants.

If you compare these to the X-direction versions:

<< left l.max_x < r.min_x

>> right l.min_x > r.max_x

there are two obvious discrepancies: the names aren't very similar and

the equality cases are handled differently.

We could incorporate the existing box_above and box_below operators into

an r-tree opclass if we defined overabove and overbelow to not match on

the equality case:

overbelow l.max_y < r.max_y

overabove l.min_y > r.min_y

However, it seems just plain weird to me to have different edge-case

behaviors in the X and Y directions. So I don't much care for that.

I would prefer that the operators added to the opclasses act the same

in both directions.

We could avoid any backwards-compatibility complaints if we left

those two operators alone (maybe redocumenting them as "below or touching"

and "above or touching", though these descriptions are a bit misleading)

and invented four new operators to be the Y-direction opclass members,

say

<<^ below l.max_y < r.min_y

>>^ above l.min_y > r.max_y

&<^ overbelow l.max_y <= r.max_y

&>^ overabove l.min_y >= r.min_y

This has a lot to recommend it: backwards compatibility and operator names

that line up with the X-direction case. On the other hand, it'll confuse

people to have operators that are so close in function, and having one be

indexable and the other not seems like a gotcha.

Plan C would be to just change the above and below operators, on the

grounds that it is an obvious typo that they don't match left and right

to begin with. We have made greater changes in the behavior of geometric

operators in the past, so this isn't an obviously bogus choice.

Not quite sure what to do, but I'd like to do something with it.

Thoughts?

regards, tom lane

- Re: Fixing r-tree semantics at 2005-06-23 22:53:05 from Andrew - Supernews
- Re: Fixing r-tree semantics at 2005-06-23 22:53:47 from Michael Fuhr
- Re: Fixing r-tree semantics at 2005-06-24 05:38:02 from Oleg Bartunov

From | Date | Subject | |
---|---|---|---|

Next Message | Andrew - Supernews | 2005-06-23 22:53:05 | Re: Fixing r-tree semantics |

Previous Message | Andrew Dunstan | 2005-06-23 21:03:50 | Re: regression failure |