Re: Question about rtrees (overleft replacing left in nodes)

From: William White <bwhite(at)frognet(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question about rtrees (overleft replacing left in nodes)
Date: 2004-03-31 17:29:39
Message-ID: 406B0003.8030902@frognet.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Tom Lane wrote:

> It'd need to be
>
> S &< T iff inf(S) <= sup(T)
>
> to satisfy the geometrical intuition. (You could quibble about the
> equality case, but box_overlap seems to consider touching boxes to
> overlap, so for consistency this should too.)
>
> However, if this is indeed wrong, why have we not heard bug reports
> stating that rtree indexes don't work?

Because the positionsel/positionjoinsel estimator functions make it all
but impossible to invoke index search on r-trees for position operators;
see below.

> Can you generate a test case
> in which it fails?

Yes, actually; I was able to demonstrate there is a problem after I'd
already sent off the email. But you have to "cheat", because the
default cost estimator functions left, overleft, right, and overright
all return a large fixed value (0.1, IIRC two orders of magnitude over
areasel/areajoinsel) which all but guarantees sequential scan for any
relvar with a reasonable number of tuples (I tried 100,000 tuples and
still got sequential scan).

These are, of course, constant cost estimator functions.

The following Perl script temporarily replaces oprrest with "areasel"
and oprjoin with "areajoinsel" (instead of positionsel and
positionjoinsel rsp.). WARNING: This assumes oids 493 and 494
correspond to box_ops << and &< respectively, that happens to be true in
the pgsql version I'm using at the moment. Could be replaced with test
for operator name and data type, of course.

Suggested remediation, if this is reproducible:

1. Modify overleft and overright to actually mean "overlap or left" and
"overlap or right" respectively, which might break existing code, or

2. Add two new functions, e.g., "overlap_or_left" and
"overlap_or_right", and two operators (e.g., "&&||<<" and "&&||>>" ...
yes, I'm kidding about the ugly operator names), and modify the operator
class definition to use the new operators instead of the old. Existing
code would continue to work (since &< and &> would now never invoke
index scan), new code could take advantage of index scan on position
operators (assuming the cost estimator were modified to permit this).

Code follows. Apologies for Perl rather than straight SQL, this was
adapted from part of a test suite for an interval (as in "interval in R"
not the IMNSHO misnamed SQL "interval" type) extension I'm writing,
which is why I ran across the problem.

---- cut here ----

#!/usr/bin/perl
open(P,"|psql >/dev/null 2>&1");
print P <<"EOF";
DROP INDEX i_test2;
DROP TABLE test2;
CREATE TABLE test2
(
i box NOT NULL
);

CREATE INDEX i_test2 ON test2 using rtree (i);
EOF

$t0 = time;
for ($i=1;$i<=1000;$i++)
{
printf(P "INSERT INTO test2 (i) VALUES ('(%d,%d),(%d,%d)');\n",
$i+1,1,$i,0);
}
print P "VACUUM ANALYZE;\n";
close(P);

open(P,"|psql");
print P <<"EOF";

-- Temporarily use cost estimators for areasel/areajoinsel on << and &<
update pg_operator set oprrest = 'areasel', oprjoin = 'areajoinsel'
where oid = 494 or oid = 493;

-- This works (at least for 10 results) since it'll be towards the right
-- half of leaves

SELECT * FROM test2 WHERE i &< ('(999.5,1),(998.5,0)') LIMIT 1;

-- This fails since it's to the left half of leaves

SELECT * FROM test2 WHERE i &< ('(11.5,1),(10.5,0)') LIMIT 1;

-- Restore original cost estimators
update pg_operator set oprrest = 'positionsel', oprjoin = 'positionjoinsel'
where oid = 494 or oid = 493;

-- Now they both work:
SELECT * FROM test2 WHERE i &< ('(999.5,1),(998.5,0)') LIMIT 1;
SELECT * FROM test2 WHERE i &< ('(11.5,1),(10.5,0)') LIMIT 1;

EOF

---- cut here ----

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John DeSoi 2004-03-31 17:30:58 row-level security model
Previous Message Diogo Biazus 2004-03-31 17:19:06 Re: Wich hardware suits best for large full-text indexed

Browse pgsql-patches by date

  From Date Subject
Next Message William White 2004-03-31 17:42:52 Re: Question about rtrees (overleft replacing left in nodes)
Previous Message Bruce Momjian 2004-03-31 16:19:07 Re: Some Documentation Changes