| From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> | 
|---|---|
| To: | Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PgSQL General <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: [HACKERS] Index greater than 8k | 
| Date: | 2006-11-01 04:24:15 | 
| Message-ID: | 4548216F.10909@commandprompt.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general pgsql-hackers | 
Alvaro Herrera wrote:
> Darcy Buskermolen wrote:
>> On October 31, 2006 08:53 am, Teodor Sigaev wrote:
>>>> The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent
>>>> a self contained test case directly to  Teodor  which shows the error.
>>>>
>>>> 'ERROR:  index row requires 8792 bytes, maximum size is 8191'
>>> Uh, I see. But I'm really surprised why do you use pg_trgm on big text?
>>> pg_trgm is designed to find similar words and use technique known as
>>> trigrams. This will work good on small pieces of text such as words or set
>>> expression. But all big texts (on the same language) will be similar :(.
>>> So, I didn't take care about guarantee that index tuple's size limitation.
>>> In principle, it's possible to modify pg_trgm to have such guarantee, but
>>> index becomes lossy - all tuples gotten  from index should be checked by
>>> table's tuple evaluation.
>> The problem is some of the data we are working with is not strictly "text" but 
>> bytea that we've run through encode(bytea, 'escape'),
> 
> I think one good question is why are you storing bytea and then
> searching like it were text. 
We are not storing bytea, a customer is. We are trying to work around
customer requirements. The data that is being stored is not always text,
sometimes it is binary (a flash file or jpeg). We are using escaped text
to be able to search the string contents of that file .
> Why not store the text as text, and put
> the extraneous bytes somewhere else?  Certainly you wouldn't expect to
> be able to find text among the bytes, would you?
Yes we do (and can) expect to find text among the bytes. We have
searches running, we are just running into the maximum size issues for
certain rows.
Sincerely,
Joshua D. Drake
--
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Gregory S. Williamson | 2006-11-01 04:36:55 | Re: [HACKERS] Index greater than 8k | 
| Previous Message | Tom Lane | 2006-11-01 04:23:59 | Re: postgres import | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2006-11-01 04:34:01 | Re: Design Considerations for New Authentication Methods | 
| Previous Message | Joshua D. Drake | 2006-11-01 03:46:48 | Re: [HACKERS] Index greater than 8k |