Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Creation of tsearch2 index is very

From: Ron <rjpeace(at)earthlink(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-performance(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Date: 2006-01-21 16:11:53
Message-ID: 7.0.1.0.2.20060121105907.03955090@earthlink.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
Perhaps a different approach to this problem is called for:

_Managing Gigabytes: Compressing and Indexing Documents and Images_  2ed
Witten, Moffat, Bell
ISBN 1-55860-570-3

This is a VERY good book on the subject.

I'd also suggest looking at the publicly available work on indexing 
and searching for search engines like Inktomi (sp?) and Google.
Ron


At 08:34 AM 1/21/2006, Oleg Bartunov wrote:
>On Sat, 21 Jan 2006, Ron wrote:
>
>>At 07:23 PM 1/20/2006, Tom Lane wrote:
>>>"Steinar H. Gunderson" <sgunderson(at)bigfoot(dot)com> writes:
>>> > On Fri, Jan 20, 2006 at 06:52:37PM -0500, Tom Lane wrote:
>>> >> It's also worth considering that the entire approach is a heuristic,
>>> >> really --- getting the furthest-apart pair of seeds doesn't guarantee
>>> >> an optimal split as far as I can see.  Maybe there's some totally
>>> >> different way to do it.
>>> > For those of us who don't know what tsearch2/gist is trying to accomplish
>>> > here, could you provide some pointers? :-)
>>>Well, we're trying to split an index page that's gotten full into 
>>>two index pages, preferably with approximately equal numbers of items in
>>>each new page (this isn't a hard requirement though).
>>
>>Maybe we are over thinking this.  What happens if we do the obvious 
>>and just make a new page and move the "last" n/2 items on the full 
>>page to the new page?
>>
>>Various forms of "move the last n/2 items" can be tested here:
>>0= just split the table in half.  Sometimes KISS  works. O(1).
>>1= the one's with the highest (or lowest) "x" value.
>>2= the one's with the highest sum of coordinates (x+y+...= values 
>>in the top/bottom n/2 of entries).
>>3= split the table so that each table has entries whose size_waste 
>>values add up to approximately the same value.
>>4= I'm sure there are others.
>>1-5 can be done in O(n) time w/o auxiliary data.  They can be done 
>>in O(1) if we've kept track of the appropriate metric as we've 
>>built the current page.
>>
>>
>>>I think the true figure of merit for a split is how often will 
>>>subsequent searches have to descend into *both* of the resulting 
>>>pages as opposed to just one
>>>--- the less often that is true, the better.  I'm not very clear 
>>>on what tsearch2 is doing with these bitmaps, but it looks like an 
>>>upper page's downlink has the union (bitwise OR) of the one-bits 
>>>in the values on the lower page, and you have to visit the lower 
>>>page if this union has a nonempty intersection with the set you 
>>>are looking for.  If that's correct, what you really want is to 
>>>divide the values so that the unions of the two sets have minimal 
>>>overlap ... which seems to me to have little to do with what the 
>>>code does at present.
>>I'm not sure what "upper page" and "lower page" mean here?
>>
>>
>>>Teodor, Oleg, can you clarify what's needed here?
>>Ditto.  Guys what is the real motivation and purpose for this code?
>
>we want not just split the page by two very distinct parts, but to keep
>resulted signatures which is ORed signature of all signatures in the page
>as much 'sparse' as can. some information available here
>http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>
>Unfortunately, we're rather busy right now and couldn't be very useful.



In response to

Responses

pgsql-performance by date

Next:From: Oleg BartunovDate: 2006-01-21 16:33:27
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Previous:From: Oleg BartunovDate: 2006-01-21 15:24:30
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow

pgsql-general by date

Next:From: Oleg BartunovDate: 2006-01-21 16:33:27
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Previous:From: Oleg BartunovDate: 2006-01-21 15:24:30
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group