Re: advice on indexing email

From: Maarten Boekhold <maarten(dot)boekhold(at)tibcofinance(dot)com>
To: Marc Tardif <intmktg(at)cam(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: advice on indexing email
Date: 2000-04-28 12:04:50
Message-ID: 39097E62.9FB27E81@tibcofinance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I wrote that fti stuff in contrib...

> My problem is how to create the full word index. The actual code to
> seperate the email into seperate words isn't a problem, but should I be
> using INSERT, BEGIN/END or COPY? In this last case, I would have to create
> a temporary file holding each word of the email and then use COPY... all
> of which also has it's fair share of overhead.

You can use one of 2 ways.

1. the fti stuff in contrib uses triggers, so every time you
insert/update/delete something in/from the 'fti-ed' table, the full text index
is also updated. If you're coding abilities are OK, you can just replace the
word breakup code in contrib/fti with your own one.

2. if you have to insert large amounts of data, it is probably faster to *not*
create the triggers at first, bulk load all your data, write a little perl
script that reads the data from your table, does the word breakup and inserts
those words into the full text index table. Using a 'sort' on the output of
the perl script will help performance as the fti data will now already be
pre-sorted in the database (you could also use CLUSTER on the fti table after
the index has been created). I think I described this somewhat better in the
README in contrib/fti. If you take this approach, don't forget to create the
triggers after the bulk load of the fti table!

Maarten

--

Maarten Boekhold, maarten(dot)boekhold(at)tibcofinance(dot)com
TIBCO Finance Technology Inc.
"Sevilla" Building
Entrada 308
1096 ED Amsterdam, The Netherlands
tel: +31 20 6601000 (direct: +31 20 6601066)
fax: +31 20 6601005
http://www.tibcofinance.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Barnes 2000-04-28 12:39:56 date format problem
Previous Message frank 2000-04-28 11:55:14 plperl.so ?