|From:||Dag Lem <dag(at)nimrod(dot)no>|
|To:||PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: daitch_mokotoff module|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
It turns out that there actually exists an(other) implementation of
the Daitch-Mokotoff Soundex System which gets it right; the JOS
Soundex Calculator at https://www.jewishgen.org/jos/jossound.htm
Other implementations I have been able to find, like the one in Apache
Commons Coded used in e.g. Elasticsearch, are far from correct.
The source code for the JOS Soundex Calculator is not available, as
far as I can tell, however I have run the complete list of 98412 last
through the calculator, in order to have a good basis for comparison.
This revealed a few shortcomings in my implementation. In particular I
had to go back to the drawing board in order to handle the dual nature
of "J" correctly. "J" can be either a vowel or a consonant in
Daitch-Mokotoff soundex, and this complicates encoding of the
I have also done a more thorough review and refactoring of the code,
which should hopefully make things quite a bit more understandable to
The changes are summarized as follows:
* Returns NULL for input without any encodable characters.
* Uses the same "unoffical" rules for "UE" as other implementations.
* Correctly considers "J" as either a vowel or a consonant.
* Corrected encoding for e.g. "HANNMANN".
* Code refactoring and comments for readability.
* Better examples in the documentation.
The implementation is now in correspondence with the JOS Soundex
Calculator for the 98412 last names mentioned above, with only the
JOS: cedeño 430000 530000
PG: cedeño 436000 536000
JOS: sadab(khura) 437000
PG: sadab(khura) 437590
I hope this addition to the fuzzystrmatch extension module will prove
to be useful to others as well!
This is my very first code contribution to PostgreSQL, and I would be
grateful for any advice on how to proceed in order to get the patch
|Next Message||Peter Geoghegan||2021-12-21 21:56:30||Re: do only critical work during single-user vacuum?|
|Previous Message||John Naylor||2021-12-21 21:35:05||Re: do only critical work during single-user vacuum?|