Full Text Index disk space requirements

From: <typea(at)l-i-e(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Full Text Index disk space requirements
Date: 2002-11-26 09:36:59
Message-ID: 61258.12.249.229.112.1038303419.squirrel@www.l-i-e.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

So, I'm trying to create a full text index as described here:

http://techdocs.postgresql.org/techdocs/fulltextindexing.php

Everything was going mostly okay...

I had to hack a quick PHP script instead of using the Perl once since I
didn't have a working Pg.pm, but that was a minor speed bump.

Then I hit a real road-block...

\copy article_fti from fulltext.sorted
\.
ERROR: copy: line 34635390, cannot extend article_fti: No space left on
device.
Check free disk space.
PQendcopy: resetting connection
archive=> \q
[root(at)rm-004-24 utilities]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 15G 15G 0 100% /
/dev/sda1 48M 6.1M 39M 14% /boot
none 439M 0 439M 0% /dev/shm

Oh. Yeah. I guess that *IS* going to be kind of big...

Any SWAGs how much disk space is required for a 500 M fulltext.sorted file?

IE, the ASCII file of string/OID pairs, in tab-delimited form, is 500 M --
How much PostgreSQL space does that turn into with the tables/indices as
described the URL above?

When I deleted all the fti rows, and did a VACUUM, there was almost 2G
available...

ALSO:
Wouldn't using f1.string = 'perth' be faster than f1.string ~ '^perth' and
equally useful? Or is ~ with ^ somehow actually faster than the seemingly
simple = comparison?

AND:
Would using OR for the individual word comparisons be a big drag on speed?
I'd kind of like to give ranked results based on how many of the terms
were present rather than a complete filter.

I'd be happy to try the EXPLAIN queries, but I don't think they're going
to be accurate without the data in the FTI table...

I got some weird results when I did a test run with a very small dataset
in the FTI table -- But I also think I was doing it in the middle of a
train-wreck between daily VACUUM and pg_dump, which were thrashing each
other with all the FTI data I had imported just for the small test...

I've altered the cron jobs to have more time in between.

THANKS IN ADVANCE!

Browse pgsql-performance by date

  From Date Subject
Next Message Maarten Boekhold 2002-11-26 12:48:42 Re: Full Text Index disk space requirements
Previous Message Ron Johnson 2002-11-26 04:27:41 Re: performance of insert/delete/update