several questions about R-tree index

From: "TJ O'Donnell" <tjo(at)acm(dot)org>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: several questions about R-tree index
Date: 2005-04-26 23:52:05
Message-ID: 3366.209.223.166.104.1114559525.squirrel@gnova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

According to the manual at:
http://www.postgresql.org/docs/7.4/static/functions-geometry.html
" The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is
involved in a comparison using one of these operators: <<, &<, &>, >>, @, ~=, && (Refer to Section
9.9 about the meaning of these operators.)"

Shouldn't the ~ (contains) operator be included also?
Isn't ~ the commutator of @ ?

I am considering using R-tree's for other-than-geometric purposes. Here is the story:
I have string data (column smiles) which represents chemical structures. Comparing strings
for equality works perfectly using an ordinary B-tree index. But locating chemical
substructures requires a more elaborate (read time-consuming) match function.
I search for substructures like this:

Select count(smiles) from structures where matches(smiles, subsmiles);

where subsmiles is the search pattern desired. This is not a fast search - about
15 seconds for 0.25 million rows.
To speed up this search I have a function (called fingerprint) which produces a bit string
representation of some common and uncommon substructures. I have populated my table
with column fp = fingerprint(smiles)
So, this is about 5 times faster:

Select count(smiles) from structures where
(fp & fingerprint(subsmiles)) = fingerprint(subsmiles)
& matches(smiles, subsmiles);
The sequence scan using the first where-clause takes about 3 seconds and the final
matches on the resulting subset is insignificant when the subset is small.

But I think I might be able to do better (faster) using R-trees.
Bitstrings can be thought of as "containing" when one bitstring has all the
same bits set as another, even if it has other bits set too - this is the gist of the
first where-clause above.

Can I expect to be able to use R-tree's to do this?
Will I simply have to define a new datatype and three R-tree functions
(union, intersection and size).
Will the use of the ~ (contains) operator cause the planner to consider
using an index (this is my first question, way above)?

I hope someone has had the patience and interest to read this far.

Thanks,
TJ

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andrew - Supernews 2005-04-27 00:39:36 Re: several questions about R-tree index
Previous Message Lord Knight of the Black Rose 2005-04-26 20:18:01 can someone jelp me on this?