|From:||Dag Lem <dag(at)nimrod(dot)no>|
|To:||Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>|
|Cc:||PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: daitch_mokotoff module|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Sorry about the latest unfinished email - don't know what key
combination I managed to hit there.
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2022-Dec-23, Dag Lem wrote:
> So, yes, I'm proposing that we returns those as array elements and that
> @> is used to match them.
Looking into the array operators I guess that to match such arrays
directly one would actually use && (overlaps) rather than @> (contains),
but I digress.
The function is changed to return an array of soundex codes - I hope it
is now to your liking :-)
I also improved on the documentation example (using Full Text Search).
AFAIK you can't make general queries like that using arrays, however in
any case I must admit that text arrays seem like more natural building
blocks than space delimited text here.
>> BTW Vera 790000 does not match Veras 794000, because they don't sound
>> the same (up to the maximum soundex code length).
> No, and maybe that's okay because they have different codes. But they
> are both similar, in Daitch-Mokotoff, to Borja, which has two codes,
> 790000 and 794000. (Any Spanish speaker will readily tell you that
> neither Vera nor Veras are similar in any way to Borja, but D-M has
> chosen to say that each of them matches one of Borjas' codes. So they
> *are* related, even though indirectly, and as a genealogist you *may* be
> interested in getting a match for a person called Vera when looking for
> relatives to a person called Veras. And, as a Spanish speaker, that
> would make a lot of sense to me.)
It is what it is - we can't call it Daitch-Mokotoff Soundex while
implementing something else. Having said that, one can always pre- or
postprocess to tweak the results.
Daitch-Mokotoff Soundex is known to produce false positives, but that is
in many cases not a problem.
Even though it's clearly tuned for Jewish names, the soundex algorithm
seems to work just fine for European names (we use it to match mostly
|Next Message||Tom Lane||2023-01-02 21:25:19||Re: An oversight in ExecInitAgg for grouping sets|
|Previous Message||Karl O. Pinc||2023-01-02 20:53:54||Re: doc: add missing "id" attributes to extension packaging page|