Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Creation of tsearch2 index is very

From: Ron <rjpeace(at)earthlink(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-performance(at)postgresql(dot)org
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Date: 2006-01-27 00:18:35
Message-ID: 7.0.1.0.2.20060126190219.0388a918@earthlink.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
At 01:27 PM 1/21/2006, Tom Lane wrote:
>Ron <rjpeace(at)earthlink(dot)net> writes:
> > At 07:23 PM 1/20/2006, Tom Lane wrote:
> >> Well, we're trying to split an index page that's gotten full into
> >> two index pages, preferably with approximately equal numbers of items in
> >> each new page (this isn't a hard requirement though).
>
> > Maybe we are over thinking this.  What happens if we do the obvious
> > and just make a new page and move the "last" n/2 items on the full
> > page to the new page?
>
>Search performance will go to hell in a handbasket :-(.  We have to make
>at least some effort to split the page in a way that will allow searches
>to visit only one of the two child pages rather than both.
>
>It's certainly true though that finding the furthest pair is not a
>necessary component of that.  It's reasonable if you try to visualize
>the problem in 2D or 3D, but I'm not sure that that geometric intuition
>holds up in such a high-dimensional space as we have here.
After reading the various papers available on GiST and RD trees, I 
think I have a decent suggestion.

Since RD tree keys contain the keys of their descendents/components 
in them, they are basically a representation of a depth first 
search.  This can be useful for intra-document searches.

OTOH, inter-document searches are going to be more akin to breadth 
first searches using RD trees.

Thus my suggestion is that we maintain =two= index structures for text data.

The first contains as many keys and all their descendents as 
possible.  When we can no longer fit a specific complete "path" into 
a page, we start a new one; trying to keep complete top level to leaf 
key sets within a page.  This will minimize paging during 
intra-document searches.

The second index keeps siblings within a page and avoids putting 
parents or children within a page unless the entire depth first 
search can be kept within the page in addition to the siblings 
present.  This will minimize paging during inter-document searches.

Traditional B-tree ordering methods can be used to define the 
ordering/placement of pages within each index, which will minimize 
head seeks to find the correct page to scan.

Since the criteria for putting a key within a page or starting a new 
page is simple, performance for those tasks should be O(1).

The price is the extra space used for two indexes instead of one, but 
at first glance that seems well worth it.

Comments?
Ron





In response to

Responses

pgsql-performance by date

Next:From: Alvaro HerreraDate: 2006-01-27 01:00:55
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Previous:From: Jozsef SzalayDate: 2006-01-26 18:03:36
Subject: Re: Incorrect Total runtime Reported by Explain Analyze!?

pgsql-general by date

Next:From: Alvaro HerreraDate: 2006-01-27 01:00:55
Subject: Re: [GENERAL] Creation of tsearch2 index is very
Previous:From: Bob PawleyDate: 2006-01-26 23:50:11
Subject: Re: Arrays

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group