Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Index greater than 8k

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, PgSQL General <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Index greater than 8k
Date: 2006-11-01 03:46:48
Message-ID: 454818A8.7060300@commandprompt.com (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
Teodor Sigaev wrote:
>> The problem as I remember it is pg_tgrm not tsearch2 directly, I've
>> sent a self contained test case directly to  Teodor  which shows the
>> error.
>> 'ERROR:  index row requires 8792 bytes, maximum size is 8191'
> Uh, I see. But I'm really surprised why do you use pg_trgm on big text?
> pg_trgm is designed to find similar words and use technique known as
> trigrams. This will  work good on small pieces of text such as words or
> set expression. But all big texts (on the same language) will be similar
> :(. So, I didn't take care about guarantee that index tuple's size
> limitation. In principle, it's possible to modify pg_trgm to have such
> guarantee, but index becomes lossy - all tuples gotten  from index
> should be checked by table's tuple evaluation.

We are trying to get something faster than ~ '%foo%';

Which Tsearch2 does not give us :)

Joshua D. Drake



> 
> If you want to search similar documents I can recommend to have a look
> to fingerprint technique (http://webglimpse.net/pubs/TR93-33.pdf). It's
> pretty close to trigrams and metrics of similarity is the same, but uses
> another signature calculations. And, there are some tips and trics:
> removing HTML marking,removing punctuation, lowercasing text and so on -
> it's interesting and complex task.


-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


In response to

Responses

pgsql-hackers by date

Next:From: Joshua D. DrakeDate: 2006-11-01 04:24:15
Subject: Re: [HACKERS] Index greater than 8k
Previous:From: Dave CramerDate: 2006-11-01 03:27:46
Subject: Extended protocol logging

pgsql-general by date

Next:From: Joshua D. DrakeDate: 2006-11-01 03:47:36
Subject: Re: Pgsql on Solaris
Previous:From: Merlin MoncureDate: 2006-11-01 02:43:11
Subject: Re: RAM Based Disk Drive?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group