Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch)

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: jim(at)nasby(dot)net, "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch)
Date: 2003-06-23 03:56:38
Message-ID: 200306230356.h5N3ucV22076@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Joe Conway wrote:
> (I never saw this make it to the list yesterday, so I'm resending to
> patches)
>
> Jim C. Nasby wrote:
> > Second argument to metaphone is suposed to set the limit on the
> > number of characters to return, but it breaks on some phrases:
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'Hello world'::varchar AS a) a;
> > HLW | HLWR | HLWRLT
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
> > AKM | AKMKS | AKMKSMMRL
> >
> > In every case I've found that does this, the 4th and 5th letters are
> > always 'KS'.
>
> Nice catch.
>
> There was a bug in the original metaphone algorithm from CPAN. Patch
> attached (while I was at it I updated my email address, changed the
> copyright to PGDG, and removed an unnecessary palloc). Here's how it
> looks now:
>
> regression=# select metaphone(a,4) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
> metaphone
> -----------
> AKMK
> (1 row)
>
> regression=# select metaphone(a,5) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
> metaphone
> -----------
> AKMKS
> (1 row)
>
> Please apply.
>
> Thanks,
>
> Joe
>

> Index: contrib/fuzzystrmatch/README.fuzzystrmatch
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v
> retrieving revision 1.2
> diff -c -r1.2 README.fuzzystrmatch
> *** contrib/fuzzystrmatch/README.fuzzystrmatch 7 Aug 2001 18:16:01 -0000 1.2
> --- contrib/fuzzystrmatch/README.fuzzystrmatch 6 Jun 2003 16:37:54 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
> Index: contrib/fuzzystrmatch/fuzzystrmatch.c
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v
> retrieving revision 1.7
> diff -c -r1.7 fuzzystrmatch.c
> *** contrib/fuzzystrmatch/fuzzystrmatch.c 10 Mar 2003 22:28:17 -0000 1.7
> --- contrib/fuzzystrmatch/fuzzystrmatch.c 6 Jun 2003 16:38:06 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
> ***************
> *** 221,229 ****
> if (!(reqlen > 0))
> elog(ERROR, "metaphone: Requested Metaphone output length must be > 0");
>
> - metaph = palloc(reqlen);
> - memset(metaph, '\0', reqlen);
> -
> retval = _metaphone(str_i, reqlen, &metaph);
> if (retval == META_SUCCESS)
> {
> --- 224,229 ----
> ***************
> *** 629,635 ****
> /* KS */
> case 'X':
> Phonize('K');
> ! Phonize('S');
> break;
> /* Y if followed by a vowel */
> case 'Y':
> --- 629,636 ----
> /* KS */
> case 'X':
> Phonize('K');
> ! if (max_phonemes == 0 || Phone_Len < max_phonemes)
> ! Phonize('S');
> break;
> /* Y if followed by a vowel */
> case 'Y':
> Index: contrib/fuzzystrmatch/fuzzystrmatch.h
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v
> retrieving revision 1.6
> diff -c -r1.6 fuzzystrmatch.h
> *** contrib/fuzzystrmatch/fuzzystrmatch.h 5 Sep 2002 00:43:06 -0000 1.6
> --- contrib/fuzzystrmatch/fuzzystrmatch.h 6 Jun 2003 16:38:13 -0000
> ***************
> *** 3,9 ****
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Copyright (c) Joseph Conway <joseph(dot)conway(at)home(dot)com>, 2001;
> *
> * levenshtein()
> * -------------
> --- 3,12 ----
> *
> * Functions for "fuzzy" comparison of strings
> *
> ! * Joe Conway <mail(at)joeconway(dot)com>
> ! *
> ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> ! * ALL RIGHTS RESERVED;
> *
> * levenshtein()
> * -------------
>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 251 bytes

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-06-23 03:57:10 Re: [GENERAL] interesting PHP/MySQL thread
Previous Message nolan 2003-06-23 03:55:00 Re: [GENERAL] interesting PHP/MySQL thread

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2003-06-23 04:33:53 Re: Runtime.SGML diff ... please expedite!
Previous Message Bruce Momjian 2003-06-23 03:42:21 Re: CIDR addresses in pg_hba.conf