|From:||Dag Lem <dag(at)nimrod(dot)no>|
|To:||Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: daitch_mokotoff module|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> Thanks, looks interesting. A couple generic comments, based on a quick
> code review.
Thank you for the constructive review!
> 1) Can the extension be marked as trusted, just like fuzzystrmatch?
I have now moved the daitch_mokotoff function into the fuzzystrmatch
module, as suggested by Andrew Dunstan.
> 2) The docs really need an explanation of what the extension is for,
> not just a link to fuzzystrmatch. Also, a couple examples would be
> helpful, I guess - similarly to fuzzystrmatch. The last line in the
> docs is annoyingly long.
Please see the updated documentation for the fuzzystrmatch module.
> 3) What's daitch_mokotov_header.pl for? I mean, it generates the
> header, but when do we need to run it?
It only has to be run if the soundex rules are changed. I have now
made the dependencies explicit in the fuzzystrmatch Makefile.
> 4) It seems to require perl-open, which is a module we did not need
> until now. Not sure how well supported it is, but maybe we can use a
> more standard module?
I believe Perl I/O layers have been part of Perl core for two decades
> 5) Do we need to keep DM_MAIN? It seems to be meant for some kind of
> testing, but our regression tests certainly don't need it (or the
> palloc mockup). I suggest to get rid of it.
Done. BTW this was modeled after dmetaphone.c
> 6) I really don't understand some of the comments in
> daitch_mokotov.sql, like for example:
> -- testEncodeBasic
> -- Tests covered above are omitted.
> Also, comments with names of Java methods seem pretty confusing. It'd
> be better to actually explain what rules are the tests checking.
The tests were copied from various web sites and implementations. I have
cut down on the number of tests and made the comments more to the point.
> 7) There are almost no comments in the .c file (ignoring the comment
> on top). Short functions like initialize_node are probably fine
> without one, but e.g. update_node would deserve one.
More comments are added to both the .h and the .c file.
> 8) Some of the lines are pretty long (e.g. the update_node signature
> is almost 170 chars). That should be wrapped. Maybe try running
> pgindent on the code, that'll show which parts need better formatting
> (so as not to get broken later).
Fixed. I did run pgindent earlier, however it didn't catch those long
> 9) I'm sure there's better way to get the number of valid chars than this:
> for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' ||
> c > ']'); i += ilen)
> Say, a while loop or something?
The code gets to the next encodable character, skipping any other
characters. I have now added a comment which should hopefully make this
clearer, and broken up the for loop for readability.
Please find attached the revised patch.
|Next Message||Robert Haas||2021-12-14 23:00:34||generalized conveyor belt storage|
|Previous Message||Tom Lane||2021-12-14 22:30:11||Re: Remove pg_strtouint64(), use strtoull() directly|