Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Index greater than 8k

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PgSQL General <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Index greater than 8k
Date: 2006-10-31 19:29:00
Message-ID: 20061031192900.GC12008@alvh.no-ip.org (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
Darcy Buskermolen wrote:
> On October 31, 2006 08:53 am, Teodor Sigaev wrote:
> > > The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent
> > > a self contained test case directly to  Teodor  which shows the error.
> > >
> > > 'ERROR:  index row requires 8792 bytes, maximum size is 8191'
> >
> > Uh, I see. But I'm really surprised why do you use pg_trgm on big text?
> > pg_trgm is designed to find similar words and use technique known as
> > trigrams. This will work good on small pieces of text such as words or set
> > expression. But all big texts (on the same language) will be similar :(.
> > So, I didn't take care about guarantee that index tuple's size limitation.
> > In principle, it's possible to modify pg_trgm to have such guarantee, but
> > index becomes lossy - all tuples gotten  from index should be checked by
> > table's tuple evaluation.
> 
> The problem is some of the data we are working with is not strictly "text" but 
> bytea that we've run through encode(bytea, 'escape'),

I think one good question is why are you storing bytea and then
searching like it were text.  Why not store the text as text, and put
the extraneous bytes somewhere else?  Certainly you wouldn't expect to
be able to find text among the bytes, would you?

I remember suggesting you to store the Content-type next to each object,
and then creating partial trigram indexes where Content-type: text/*.
Did that plan not work for some reason?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

pgsql-hackers by date

Next:From: Gregory StarkDate: 2006-10-31 19:49:27
Subject: Re: [HACKERS] WAL logging freezing
Previous:From: Darcy BuskermolenDate: 2006-10-31 18:49:37
Subject: Re: [HACKERS] Index greater than 8k

pgsql-general by date

Next:From: Merlin MoncureDate: 2006-10-31 19:48:00
Subject: Re: RAM Based Disk Drive?
Previous:From: AdamDate: 2006-10-31 19:13:55
Subject: RAM Based Disk Drive?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group