Fuzzy matching?

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Fuzzy matching?
Date: 2001-07-31 16:05:28
Message-ID: web-94360@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-sql

Folks,

For many of my programs, it would be extremely useful to have some form
of "fuzzy matching" for VARCHAR fields. There are two kinds of fuzzy
matching for words that I know of:

1. Phonetic matching, which would be nice but will have to wait for
someone's $100,000 project;

2. Textual mathcing, which I will outline below.

The way textual fuzzy matching should work is as follows:
The developer supplies two VARCHARs to match and a number/percent of
character mis-match that is acceptable:

Fuzzy_match('Thornton','Tornton',1)

And the fuzzy_match should return True if the two phrases are no more
than that number of characters different. Thus, we should get:

fuzzy_match('Thornton','Tornton',1) = TRUE
fuzzy_match('Thornton','Torntin',1) = FALSE
fuzzy_match('Thornton','Torntin',2) = TRUE

Unfortunately, I cannot think of a way to make this happen in a function
without cycling through all the possible permutations of characters for
both words or doing some character-by-character comparison with
elaborate logic for placement. Either of these approaches would be very
slow, and completely unsuitable for column comparisons on large tables.

Can anyone suggest some shortcuts here? Perhaps using pl/perl or
something similar?

Grazie!

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

Attachment Content-Type Size
unknown_filename text/plain 2 bytes
unknown_filename text/plain 2 bytes
unknown_filename text/plain 2 bytes

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2001-07-31 16:16:58 Re: Patch to add insert of multiple tuples per INSERT statement
Previous Message Liam Stewart 2001-07-31 15:09:40 Patch to add insert of multiple tuples per INSERT statement

Browse pgsql-sql by date

  From Date Subject
Next Message Joe Conway 2001-07-31 17:07:05 Re: Fuzzy matching?
Previous Message Tom Lane 2001-07-31 15:16:43 Re: union in subselect?