Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW | HLWR | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
> AKM | AKMKS | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.
Nice catch.
There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:
regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
metaphone
-----------
AKMK
(1 row)
regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
metaphone
-----------
AKMKS
(1 row)
Please apply.
Thanks,
Joe
Index: contrib/fuzzystrmatch/README.fuzzystrmatch
===================================================================
RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v
retrieving revision 1.2
diff -c -r1.2 README.fuzzystrmatch
*** contrib/fuzzystrmatch/README.fuzzystrmatch 7 Aug 2001 18:16:01 -0000 1.2
--- contrib/fuzzystrmatch/README.fuzzystrmatch 6 Jun 2003 16:37:54 -0000
***************
*** 3,9 ****
*
* Functions for "fuzzy" comparison of strings
*
! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
*
* levenshtein()
* -------------
--- 3,12 ----
*
* Functions for "fuzzy" comparison of strings
*
! * Joe Conway <mail@joeconway.com>
! *
! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
! * ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------
Index: contrib/fuzzystrmatch/fuzzystrmatch.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v
retrieving revision 1.7
diff -c -r1.7 fuzzystrmatch.c
*** contrib/fuzzystrmatch/fuzzystrmatch.c 10 Mar 2003 22:28:17 -0000 1.7
--- contrib/fuzzystrmatch/fuzzystrmatch.c 6 Jun 2003 16:38:06 -0000
***************
*** 3,9 ****
*
* Functions for "fuzzy" comparison of strings
*
! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
*
* levenshtein()
* -------------
--- 3,12 ----
*
* Functions for "fuzzy" comparison of strings
*
! * Joe Conway <mail@joeconway.com>
! *
! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
! * ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------
***************
*** 221,229 ****
if (!(reqlen > 0))
elog(ERROR, "metaphone: Requested Metaphone output length must be > 0");
- metaph = palloc(reqlen);
- memset(metaph, '\0', reqlen);
-
retval = _metaphone(str_i, reqlen, &metaph);
if (retval == META_SUCCESS)
{
--- 224,229 ----
***************
*** 629,635 ****
/* KS */
case 'X':
Phonize('K');
! Phonize('S');
break;
/* Y if followed by a vowel */
case 'Y':
--- 629,636 ----
/* KS */
case 'X':
Phonize('K');
! if (max_phonemes == 0 || Phone_Len < max_phonemes)
! Phonize('S');
break;
/* Y if followed by a vowel */
case 'Y':
Index: contrib/fuzzystrmatch/fuzzystrmatch.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v
retrieving revision 1.6
diff -c -r1.6 fuzzystrmatch.h
*** contrib/fuzzystrmatch/fuzzystrmatch.h 5 Sep 2002 00:43:06 -0000 1.6
--- contrib/fuzzystrmatch/fuzzystrmatch.h 6 Jun 2003 16:38:13 -0000
***************
*** 3,9 ****
*
* Functions for "fuzzy" comparison of strings
*
! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
*
* levenshtein()
* -------------
--- 3,12 ----
*
* Functions for "fuzzy" comparison of strings
*
! * Joe Conway <mail@joeconway.com>
! *
! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
! * ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------