Re: Fuzzy matching?

From: "Timothy H(dot) Keitt" <tklistaddr(at)keittlab(dot)bio(dot)sunysb(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Fuzzy matching?
Date: 2001-08-01 02:27:17
Message-ID: 3B676905.90208@keittlab.bio.sunysb.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-sql

I posted this many moons ago to pgsql-hackers. 'Guess nobody noticed.

Tim

Josh Berkus wrote:

>Folks,
>
>For many of my programs, it would be extremely useful to have some form
>of "fuzzy matching" for VARCHAR fields. There are two kinds of fuzzy
>matching for words that I know of:
>
>1. Phonetic matching, which would be nice but will have to wait for
>someone's $100,000 project;
>
>2. Textual mathcing, which I will outline below.
>
>The way textual fuzzy matching should work is as follows:
>The developer supplies two VARCHARs to match and a number/percent of
>character mis-match that is acceptable:
>
>Fuzzy_match('Thornton','Tornton',1)
>
>And the fuzzy_match should return True if the two phrases are no more
>than that number of characters different. Thus, we should get:
>
>fuzzy_match('Thornton','Tornton',1) = TRUE
>fuzzy_match('Thornton','Torntin',1) = FALSE
>fuzzy_match('Thornton','Torntin',2) = TRUE
>
>Unfortunately, I cannot think of a way to make this happen in a function
>without cycling through all the possible permutations of characters for
>both words or doing some character-by-character comparison with
>elaborate logic for placement. Either of these approaches would be very
>slow, and completely unsuitable for column comparisons on large tables.
>
>Can anyone suggest some shortcuts here? Perhaps using pl/perl or
>something similar?
>
>Grazie!
>
>-Josh Berkus
>
>______AGLIO DATABASE SOLUTIONS___________________________
> Josh Berkus
> Complete information technology josh(at)agliodbs(dot)com
> and data management solutions (415) 565-7293
> for law firms, small businesses fax 621-2533
> and non-profit organizations. San Francisco
>
>
>------------------------------------------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
> Part 1.2
>
> Content-Type:
>
> text/plain
> Content-Encoding:
>
> base64
>
>
> ------------------------------------------------------------------------
> Part 1.3
>
> Content-Type:
>
> text/plain
> Content-Encoding:
>
> base64
>
>
> ------------------------------------------------------------------------
> Part 1.4
>
> Content-Type:
>
> text/plain
> Content-Encoding:
>
> base64
>
>
> ------------------------------------------------------------------------
> Part 1.5
>
> Content-Type:
>
> text/plain
> Content-Encoding:
>
> binary
>
>

--
Timothy H. Keitt
Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794 USA
Phone: 631-632-1101, FAX: 631-632-7626
http://life.bio.sunysb.edu/ee/keitt/

Attachment Content-Type Size
levenshtein_distance.c text/plain 5.3 KB

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Timothy H. Keitt 2001-08-01 02:40:51 Re: Fuzzy matching?
Previous Message Helge Bahmann 2001-08-01 00:35:00 Revised: Allow IDENT authentication on local connections

Browse pgsql-sql by date

  From Date Subject
Next Message Timothy H. Keitt 2001-08-01 02:40:51 Re: Fuzzy matching?
Previous Message Christopher Sawtell 2001-07-31 21:31:51 Re: Fuzzy matching?