Re: OK, does anyone have any better ideas?

From: mlw <markw(at)mohawksoft(dot)com>
To: Andrew Snow <als(at)fl(dot)net(dot)au>
Cc: Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OK, does anyone have any better ideas?
Date: 2000-12-09 02:40:14
Message-ID: 3A319B8E.14F0B01E@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-novice

Andrew Snow wrote:
>
> Could you perhaps post the code you have for splitting a text field up into
> keys, then I could work on turning into a new type of index with a new
> operator, as Tom suggested?
>
> (Or is this already what the text search code in contrib already does??)
>
> - Andrew

OK, I guess I am not getting everything across. Let me give the basics:

There are two programs: sqlindex, and sqlfts.

sqlindex, is the SQL indexer.
sqlfts, is the SQL full text server.

They currently take a config file, which will be replaced by columns in
the database. (This technology is aimed at multiple databases and
non-SQL uses) The config file currently looks like the example at the
bottom of this post.

The current incarnation of this server sits outside of Postgres and
execute joins based the results of the query.

The indexing query returns a number of fields, one must be designated as
the "key" field. In websearch lingo, think of it as "document name."
During index time, I separate the individual fields and create bitmap
files which relate word numbers to document bits.

Words are parsed and a dictionary is created. Phrase meta-data is also
stored along with the document reference (key field) associated with a
document number.

When a query is executed, each word is picked out of the dictionary. At
various points, phrases are evaluated, the bitmap indexes are ANDed,
ORed, or NOTed together, rank is applied. The results are then sorted by
rank, and the document numbers are merged in with document "references"
(key field value) and return with the rank.

This technology works quite well as a search engine sort of thing if I
store a URL or file name and a teaser as the document reference. I
thought it would be cool (and easy) if I just stored a SQL key field as
the URL, and connected this stuff to a SQL database. I chose Postgres
because I had used it in a number of projects, and thought since it was
open source I would have fewer problems.

It has not been easy to do what I thought would be a fairly trivial
task. I am starting to get Windows programming flashbacks of the "so
close, but yet so far" feeling one gets when one tries to do
conceptually simple things on Windows.

I'm sorry I am getting discouraged and beginning to think that this
project is not going to work.

>>>>>>>>>>> configuration file <<<<<<<<<<<<<<<<<<<<<<
# The computer host name used for the database
sqlindex=localhost
sqlfts=localhost

# The name of the database
sqldb=cdinfo

# Base name of the index files.
basename=cdinfo/index

# The key field used to index and find records.
sqlkey=trackid
sqlkeyindex=off
metaphone=1

# A SQL query that produces a single result, which is the count of
# records to be indexed.
sqlcount=select count(trackid) from zsong

# The SQL query used to produce data to be indexed.
sqlretrieve=select * from songview;
sqlfields = all,performer2,title,song

# A SQL query that will be used to display a list records found
sqldisplay=select zsong.muzenbr, performer, performer2, title, song,
trackid from ztitles, zsong where zsong.muzenbr = ztitles.muzenbr and
zsong.trackid = %s

# The tcport is the TCP/IP port for the server
tcpport = 8090
ftsproc = 5
ftsqueue = 32
<<<<<<<<<<<<

--
http://www.mohawksoft.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Horst Herb 2000-12-09 02:58:29 CRC, hash & Co.
Previous Message Daniele Orlandi 2000-12-09 02:31:27 Re: European Datestyle

Browse pgsql-novice by date

  From Date Subject
Next Message Oleg Bartunov 2000-12-09 08:50:17 Re: OK, does anyone have any better ideas?
Previous Message mlw 2000-12-09 02:07:22 Re: OK, does anyone have any better ideas?