RESOLVED: Re: Maximum document-size of text-search?

From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: RESOLVED: Re: Maximum document-size of text-search?
Date: 2010-07-22 13:55:07
Message-ID: 4C484DBB.2010702@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/22/2010 03:31 PM, Andreas Joseph Krogh wrote:
> Hi.
> I'm trying to index the contents of word-documents, extracted text,
> which leads to quite large documents sometimes. This resutls in the
> following Exception:
> Caused by: org.postgresql.util.PSQLException: ERROR: index row
> requires 10376 bytes, maximum size is 8191
>
> I have the following schema:
> andreak=# \d origo_search_index
> Table "public.origo_search_index"
> Column | Type
> | Modifiers
> --------------------------+-------------------+-----------------------------------------------------------------
>
> id | integer | not null default
> nextval('origo_search_index_id_seq'::regclass)
> entity_id | integer | not null
> entity_type | character varying | not null
> field | character varying | not null
> search_value | character varying | not null
> textsearchable_index_col | tsvector |
>
> "origo_search_index_fts_idx" gin (textsearchable_index_col)
>
> Triggers:
> update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON
> origo_search_index FOR EACH ROW EXECUTE PROCEDURE
> tsvector_update_trigger('textsearchable_index_col',
> 'pg_catalog.english', 'search_value')
>
> I store all the text extracted from the documents in "search_value"
> and have the built-in trigger tsvector_update_trigger update the
> tsvector-column.
>
> Any hints on how to get around this issue to allow indexing large
> documents? I don't see how "only index the first N bytes of the
> document" would be of interest to anyone...
>
> BTW: I'm using PG-9.0beta3

Never mind... I was having a btree index on search_value too, which of
course caused the problem.

--
Andreas Joseph Krogh<andreak(at)officenet(dot)no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Rosenholmveien 25 | know how to do a thing and to watch |
1414 Trollåsen | somebody else doing it wrong, without |
NORWAY | comment. |
| |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2010-07-22 14:15:18 Clarification of the "simple" dictionary
Previous Message Greg Sabino Mullane 2010-07-22 13:34:25 Re: Finding last checkpoint time