Re: Fastest Index/Algorithm to find similar sentences

From: Dann Corbit <DCorbit(at)connx(dot)com>
To: 'Janek Sendrowski' <janek12(at)web(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fastest Index/Algorithm to find similar sentences
Date: 2013-07-26 05:27:29
Message-ID: 87F42982BF2B434F831FCEF4C45FC33E64F119D6@EXCHANGE.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Of course, you can use regular expressions and LIKE. Without understanding the structure of your database, I don't know if that can be made efficient. For a collection of sentences, I suspect it would get complicated. It would probably be slow. I guess that what you want to do will be hard to perform in an efficient manner using a standard relational database with commonly used functions such as LIKE and REGEX.

Perhaps one of the bioinformatics projects like PostBIO or PostBIS can be adapted to suit your needs. They deal with quickly finding similar sequences that are very complex, but they are designed specifically for DNA sequences.
Just a thought.

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Janek Sendrowski
Sent: Thursday, July 25, 2013 3:55 PM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Fastest Index/Algorithm to find similar sentences

Hi,

I'm searching for an algorithm/Index to find similar sentences in a database.

The Fulltextsearch is not really suitable because it doesn't have a tolerance.

The Levenshtein-distance ist to slow.

I also tried pg_trgm module, which works with tri-grams, but it's also very slow with 100.000+ rows.

I hope someone can help, I can't really find sth. which is fast enough.

Best regards,
Janek
 
 

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2013-07-26 05:30:46 Re: Tablespace on Postgrsql
Previous Message Atri Sharma 2013-07-26 05:16:08 Re: Tablespace on Postgrsql