Quick Links

String Similarity

From:	"Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	String Similarity
Date:	2006-05-19 20:00:48
Message-ID:	18405.24.91.171.78.1148068848.squirrel@mail.mohawksoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I have a side project that needs to "intelligently" know if two strings
are contextually similar. Think about how CDDB information is collected
and sorted. It isn't perfect, but there should be enough information to be
usable.

Think about this:

"pink floyd - dark side of the moon - money"
"dark side of the moon - pink floyd - money"
"money - dark side of the moon - pink floyd"
etc.

To a human, these strings are almost identical. Similarly:

"dark floyd of money moon pink side the"

Is a puzzle to be solved by 13 year old children before the movie starts.

My post has three questions:

(1) Does anyone know of an efficient and numerically quantified method of
detecting these sorts of things? I currently have a fairly inefficient and
numerically bogus solution that may be the only non-impossible solution
for the problem.

(2) Does any one see a need for this feature in PostgreSQL? If so, what
kind of interface would be best accepted as a patch? I am currently
returning a match liklihood between 0 and 100;

(3) Is there also a desire for a Levenshtein distence function for text
and varchars? I experimented with it, and was forced to write the function
in item #1.

Responses

Re: String Similarity at 2006-05-19 19:54:32 from Martijn van Oosterhout
Re: String Similarity at 2006-05-19 19:59:30 from Andrew Dunstan
Re: String Similarity at 2006-05-19 20:52:53 from Mark Dilger
Re: String Similarity at 2006-05-19 22:50:00 from Greg Sabino Mullane
Re: String Similarity at 2006-05-20 04:30:09 from Oleg Bartunov
Re: String Similarity at 2006-05-22 03:15:42 from Christopher Kings-Lynne

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mischa Sandberg	2006-05-19 20:03:08	Porting MSSQL to PGSQL (Was: [OT] MySQL is bad, but THIS bad?)
Previous Message	Jim C. Nasby	2006-05-19 20:00:19	Re: [OT] MySQL is bad, but THIS bad?