Re: String Similarity

From: Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: String Similarity
Date: 2006-05-22 03:15:42
Message-ID: 44712CDE.1090608@calorieking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Try contrib/pg_trgm...

Chris

Mark Woodward wrote:
> I have a side project that needs to "intelligently" know if two strings
> are contextually similar. Think about how CDDB information is collected
> and sorted. It isn't perfect, but there should be enough information to be
> usable.
>
> Think about this:
>
> "pink floyd - dark side of the moon - money"
> "dark side of the moon - pink floyd - money"
> "money - dark side of the moon - pink floyd"
> etc.
>
> To a human, these strings are almost identical. Similarly:
>
> "dark floyd of money moon pink side the"
>
> Is a puzzle to be solved by 13 year old children before the movie starts.
>
> My post has three questions:
>
> (1) Does anyone know of an efficient and numerically quantified method of
> detecting these sorts of things? I currently have a fairly inefficient and
> numerically bogus solution that may be the only non-impossible solution
> for the problem.
>
> (2) Does any one see a need for this feature in PostgreSQL? If so, what
> kind of interface would be best accepted as a patch? I am currently
> returning a match liklihood between 0 and 100;
>
> (3) Is there also a desire for a Levenshtein distence function for text
> and varchars? I experimented with it, and was forced to write the function
> in item #1.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Christopher Kings-Lynne

Technical Manager
CalorieKing
Tel: +618.9389.8777
Fax: +618.9389.8444
chris(dot)kings-lynne(at)calorieking(dot)com
www.calorieking.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-22 06:54:30 Re: problem with PQsendQuery/PQgetResult and COPY FROM statement
Previous Message Tom Lane 2006-05-21 22:43:40 Re: FW: iDefense Q2 2006 Vulnerability Challenge