Quick Links

Re: General guidance: Levenshtein distance versus other similarity algorithms

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Rachel Owsley <Rachel(dot)Owsley(at)edointeractive(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: General guidance: Levenshtein distance versus other similarity algorithms
Date:	2012-07-25 18:31:34
Message-ID:	CAHyXU0ycbACiVOLQ2Q-nKfr7c4oSs+4Hgbj5aD7g=J2K1yTwXQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Jul 23, 2012 at 11:55 AM, Rachel Owsley
<Rachel(dot)Owsley(at)edointeractive(dot)com> wrote:
> Hi,
>
> I am hoping you can give me some guidance here. I’m using postgresql 9.1.
>
> Basically, I’m trying to create a query on a table of businesses that will
> return all similar matches to a business name. This is a huge table, and
> there is a lot of variation in names. The length of the string can be up to
> 255. I’ve used regex, but there are always some variations of the name that
> are missed when I do a regex. So I decided to look at distance measures.
>
> Has anyone compared the fuzzstrmatch package to pgsimilarity?
>
> Would the levenshtein function in postgresql be the best way to go here? If
> so, should I use levenshtein in the contribution package or install the
> pgsimilarity package? Has anyone tried both implementations?

Another option that works with 9.1 is the pg_trgm module
(http://www.postgresql.org/docs/9.1/static/pgtrgm.html) It works
very well for 9.1 and has the advantage of having built-in gist and
gin operator support.

Can't speak on pg_similarity, haven't used it.

merlin

In response to

General guidance: Levenshtein distance versus other similarity algorithms at 2012-07-23 16:55:56 from Rachel Owsley

Responses

Re: General guidance: Levenshtein distance versus other similarity algorithms at 2012-07-25 20:15:33 from Rachel Owsley

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rachel Owsley	2012-07-25 20:15:33	Re: General guidance: Levenshtein distance versus other similarity algorithms
Previous Message	Henry Drexler	2012-07-25 18:07:17	Re: BI tools and postgresql