From: | <typea(at)l-i-e(dot)com> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Full Text Index disk space requirements |
Date: | 2002-11-26 09:36:59 |
Message-ID: | 61258.12.249.229.112.1038303419.squirrel@www.l-i-e.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
So, I'm trying to create a full text index as described here:
http://techdocs.postgresql.org/techdocs/fulltextindexing.php
Everything was going mostly okay...
I had to hack a quick PHP script instead of using the Perl once since I
didn't have a working Pg.pm, but that was a minor speed bump.
Then I hit a real road-block...
\copy article_fti from fulltext.sorted
\.
ERROR: copy: line 34635390, cannot extend article_fti: No space left on
device.
Check free disk space.
PQendcopy: resetting connection
archive=> \q
[root(at)rm-004-24 utilities]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 15G 15G 0 100% /
/dev/sda1 48M 6.1M 39M 14% /boot
none 439M 0 439M 0% /dev/shm
Oh. Yeah. I guess that *IS* going to be kind of big...
Any SWAGs how much disk space is required for a 500 M fulltext.sorted file?
IE, the ASCII file of string/OID pairs, in tab-delimited form, is 500 M --
How much PostgreSQL space does that turn into with the tables/indices as
described the URL above?
When I deleted all the fti rows, and did a VACUUM, there was almost 2G
available...
ALSO:
Wouldn't using f1.string = 'perth' be faster than f1.string ~ '^perth' and
equally useful? Or is ~ with ^ somehow actually faster than the seemingly
simple = comparison?
AND:
Would using OR for the individual word comparisons be a big drag on speed?
I'd kind of like to give ranked results based on how many of the terms
were present rather than a complete filter.
I'd be happy to try the EXPLAIN queries, but I don't think they're going
to be accurate without the data in the FTI table...
I got some weird results when I did a test run with a very small dataset
in the FTI table -- But I also think I was doing it in the middle of a
train-wreck between daily VACUUM and pg_dump, which were thrashing each
other with all the FTI data I had imported just for the small test...
I've altered the cron jobs to have more time in between.
THANKS IN ADVANCE!
From | Date | Subject | |
---|---|---|---|
Next Message | Maarten Boekhold | 2002-11-26 12:48:42 | Re: Full Text Index disk space requirements |
Previous Message | Ron Johnson | 2002-11-26 04:27:41 | Re: performance of insert/delete/update |