Quick Links

Re: [GENERAL] Creation of tsearch2 index is very

From:	Ron <rjpeace(at)earthlink(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-performance(at)postgresql(dot)org
Subject:	Re: [GENERAL] Creation of tsearch2 index is very
Date:	2006-01-27 00:18:35
Message-ID:	7.0.1.0.2.20060126190219.0388a918@earthlink.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

At 01:27 PM 1/21/2006, Tom Lane wrote:
>Ron <rjpeace(at)earthlink(dot)net> writes:
> > At 07:23 PM 1/20/2006, Tom Lane wrote:
> >> Well, we're trying to split an index page that's gotten full into
> >> two index pages, preferably with approximately equal numbers of items in
> >> each new page (this isn't a hard requirement though).
>
> > Maybe we are over thinking this. What happens if we do the obvious
> > and just make a new page and move the "last" n/2 items on the full
> > page to the new page?
>
>Search performance will go to hell in a handbasket :-(. We have to make
>at least some effort to split the page in a way that will allow searches
>to visit only one of the two child pages rather than both.
>
>It's certainly true though that finding the furthest pair is not a
>necessary component of that. It's reasonable if you try to visualize
>the problem in 2D or 3D, but I'm not sure that that geometric intuition
>holds up in such a high-dimensional space as we have here.
After reading the various papers available on GiST and RD trees, I
think I have a decent suggestion.

Since RD tree keys contain the keys of their descendents/components
in them, they are basically a representation of a depth first
search. This can be useful for intra-document searches.

OTOH, inter-document searches are going to be more akin to breadth
first searches using RD trees.

Thus my suggestion is that we maintain =two= index structures for text data.

The first contains as many keys and all their descendents as
possible. When we can no longer fit a specific complete "path" into
a page, we start a new one; trying to keep complete top level to leaf
key sets within a page. This will minimize paging during
intra-document searches.

The second index keeps siblings within a page and avoids putting
parents or children within a page unless the entire depth first
search can be kept within the page in addition to the siblings
present. This will minimize paging during inter-document searches.

Traditional B-tree ordering methods can be used to define the
ordering/placement of pages within each index, which will minimize
head seeks to find the correct page to scan.

Since the criteria for putting a key within a page or starting a new
page is simple, performance for those tasks should be O(1).

The price is the extra space used for two indexes instead of one, but
at first glance that seems well worth it.

Comments?
Ron

In response to

Re: [GENERAL] Creation of tsearch2 index is very at 2006-01-21 18:27:53 from Tom Lane

Responses

Re: [GENERAL] Creation of tsearch2 index is very at 2006-01-27 01:00:55 from Alvaro Herrera

Browse pgsql-general by date

	From	Date	Subject
Next Message	Alvaro Herrera	2006-01-27 01:00:55	Re: [GENERAL] Creation of tsearch2 index is very
Previous Message	Bob Pawley	2006-01-26 23:50:11	Re: Arrays

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alvaro Herrera	2006-01-27 01:00:55	Re: [GENERAL] Creation of tsearch2 index is very
Previous Message	Jozsef Szalay	2006-01-26 18:03:36	Re: Incorrect Total runtime Reported by Explain Analyze!?