Re: tsvector limitations

From: Tim <elatllat(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: tsvector limitations
Date: 2011-06-14 20:42:49
Message-ID: BANLkTiksUZuiKrtbsDfUkfYUU=zMEgK2cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

So I ran this test:
unzip -p text.docx word/document.xml | perl -p -e
's/<.+?>/\n/g;s/[^a-z0-9\n]/\n/ig;'|grep ".." > text.txt
ls -hal ./text.*
#-rwxrwxrwx 1 postgres postgres 15M 2011-06-14 15:12 ./text.docx
#-rwxrwxrwx 1 postgres postgres 29M 2011-06-14 15:17 ./text.txt
mv /tmp/text.* /var/lib/postgresql/9.0/main/
cd ~/;psql -d postgres
#psql (9.0.4)
CREATE DATABASE test;
\q
cd ~/;psql -d test
CREATE TABLE test(title VARCHAR(256), data OID, words TSVECTOR);
INSERT INTO test VALUES ( 'text.docx', LO_IMPORT('text.docx'),
TO_TSVECTOR(pg_read_file('text.txt' ,0, 100000000)) );

and I got this:
#ERROR: string is too long for tsvector (30990860 bytes, max 1048575
bytes)

doing the math
echo "scale=3;29/(30990860/1048575)"|bc #==0.981

Indicates a (worst case) limit of searching only the first 1MB of a text
file before you need to start building your own search maybe on top of
tsvector.
The year is 2011 I don't think searching a 2MB text file is to much to
expect.
So how am I to use the PGSQL FTS as a "full text search" when the above
example can only handle a "small or partial text search"?
If I'm not missing anything maybe the documentation should be adjusted
accordingly.

Any thoughts or alternatives are most welcome.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Craig James 2011-06-14 21:12:05 Re: tsvector limitations
Previous Message sundaram 2011-06-14 19:13:56 Re: psql shell return codes - checking if database exists