Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Creation of tsearch2 index is very

From: Ron <rjpeace(at)earthlink(dot)net>
To: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Date: 2006-01-27 03:33:07
Message-ID: 7.0.1.0.2.20060126221829.037961c8@earthlink.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
You seem to have missed my point.  I just gave a very clear 
description of how to "decide which bitmaps go in each of the two 
buckets" by reformulating the question into "decide which bitmaps go 
in each of =four= buckets".

The intent is to have two indexes, one optimized for one common class 
of searches, and the other optimized for another common class of searches.

By decomposing the optimization problem into two simpler problems, 
the hope is that we address all the issues reasonably simply while 
still getting decent performance.

Nothing is free.  The price we pay, and it is significant, is that we 
now have two indexes where before we had only one.

Ron


At 09:29 PM 1/26/2006, Craig A. James wrote:
>Ron <rjpeace(at)earthlink(dot)net> writes:
>>We have two problems here.
>>The first is that the page splitting code for these indexes 
>>currently has O(N^2) performance.
>>The second is that whatever solution we do use for this 
>>functionality, we still need good performance during searches that 
>>use the index.
>
>No, unfortunately that's not the problem that needs to be solved.
>
>The problem is figuring out WHICH records to put in the "left" and 
>"right" trees once you split them.  If you can figure that out, then 
>your suggestion (and perhaps other techniques) could be useful.
>
>The problem boils down to this:  You have a whole bunch of 
>essentially random bitmaps.  You have two buckets.  You want to put 
>half of the bitmaps in one bucket, and half in the other bucket, and 
>when you get through, you want all of the bitmaps in each bucket to 
>be maximally similar to each other, and maximally dissimilar to the 
>ones in the other bucket.
>
>That way, when you OR all the bitmaps in each bucket together to 
>build the bitmap for the left and right child nodes of the tree, 
>you'll get maximum separation -- the chances that you'll have to 
>descend BOTH the left and right nodes of the tree are minimized.
>
>Unfortunately, this problem is very likely in the set of NP-complete 
>problems, i.e. like the famous "Traveling Salesman Problem," you can 
>prove there's no algorithm that will give the answer in a reasonable 
>time.  In this case, "reasonable" would be measured in milliseconds 
>to seconds, but in fact an actual "perfect" split of a set of 
>bitmaps probably can't be computed in the lifetime of the universe 
>for more than a few hundred bitmaps.
>
>That's the problem that's being discussed: How do you decide which 
>bitmaps go in each of the two buckets?  Any solution will 
>necessarily be imperfect, a pragmatic algorithm that gives an 
>imperfect, but acceptable, answer.
>
>As I mentioned earlier, chemists make extensive use of bitmaps to 
>categorize and group molecules.  They use Tanimoto or Tversky 
>similarity metrics (Tanimoto is a special case of Tversky), because 
>it's extremely fast to compare two bitmaps, and the score is highly 
>correlated with the number of bits the two bitmaps have in common.
>
>But even with a fast "distance" metric like Tanimoto, there's still 
>no easy way to decide which bucket to put each bitmap into.
>
>Craig



In response to

Responses

pgsql-performance by date

Next:From: RonDate: 2006-01-27 07:52:45
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Previous:From: Craig A. JamesDate: 2006-01-27 02:29:22
Subject: Re: [GENERAL] Creation of tsearch2 index is very

pgsql-general by date

Next:From: Matthew T. O'ConnorDate: 2006-01-27 03:48:11
Subject: Re: VACUUM Question
Previous:From: Rich ShepardDate: 2006-01-27 02:53:00
Subject: Re: What Could Cause This Behavior?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group