From: | Arne Roland <A(dot)Roland(at)index(dot)de> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Permute underscore separated components of columns before fuzzy matching |
Date: | 2023-01-06 21:29:12 |
Message-ID: | 555aa9ea1043489bae7a41962877bd04@index.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
we have the great fuzzy string match, that comes up with suggestions in the case of a typo of a column name.
Since underscores are the de facto standard of separating words, it would also make sense to also generate suggestions, if the order of words gets mixed up. Example: If the user types timstamp_entry instead of entry_timestamp the suggestion shows up.
The attached patch does that for up to three segments, that are separated by underscores. The permutation of two segments is treated the same way a wrongly typed char would be.
The permutation is skipped, if the typed column name contains more than 6 underscores to prevent a meaningful (measured on my development machine) slowdown, if the user types to many underscores. In terms of underscores m and the length of the individual strings n_att and n_col the trivial upper bound is O(n_att * n_col * m^2). Considering, that strings with a lot of underscores have a bigger likelihood of being long as well, I simply decided to add it. I still wonder a bit whether it should be disabled entirely (as this patch does) or only the swap-three sections part as the rest would bound by O(n_att * n_col * m). But the utility of only swapping two sections seems a bit dubious to me, if I have 7 or more of them.
To me this patch seems simple (if string handling in C can be called that way) and self contained. Despite my calculations above, it resides in a non performance critical piece of code. I think of it as a quality of life thing.
Let me know what you think. Thank you!
Regards
Arne
Attachment | Content-Type | Size |
---|---|---|
0001-fuzzy_underscore_permutation.patch | text/x-patch | 4.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2023-01-06 21:55:47 | Re: RFC: logical publication via inheritance root? |
Previous Message | Andrew Dunstan | 2023-01-06 21:08:29 | Re: [PATCH] Expand character set for ltree labels |