Permute underscore separated components of columns before fuzzy matching

From: Arne Roland <A(dot)Roland(at)index(dot)de>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Permute underscore separated components of columns before fuzzy matching
Date: 2023-01-06 21:29:12
Message-ID: 555aa9ea1043489bae7a41962877bd04@index.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

we have the great fuzzy string match, that comes up with suggestions in the case of a typo of a column name.

Since underscores are the de facto standard of separating words, it would also make sense to also generate suggestions, if the order of words gets mixed up. Example: If the user types timstamp_entry instead of entry_timestamp the suggestion shows up.

The attached patch does that for up to three segments, that are separated by underscores. The permutation of two segments is treated the same way a wrongly typed char would be.

The permutation is skipped, if the typed column name contains more than 6 underscores to prevent a meaningful (measured on my development machine) slowdown, if the user types to many underscores. In terms of underscores m and the length of the individual strings n_att and n_col the trivial upper bound is O(n_att * n_col * m^2). Considering, that strings with a lot of underscores have a bigger likelihood of being long as well, I simply decided to add it. I still wonder a bit whether it should be disabled entirely (as this patch does) or only the swap-three sections part as the rest would bound by O(n_att * n_col * m). But the utility of only swapping two sections seems a bit dubious to me, if I have 7 or more of them.

To me this patch seems simple (if string handling in C can be called that way) and self contained. Despite my calculations above, it resides in a non performance critical piece of code. I think of it as a quality of life thing.
Let me know what you think. Thank you!

Regards
Arne

Attachment Content-Type Size
0001-fuzzy_underscore_permutation.patch text/x-patch 4.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-01-06 21:55:47 Re: RFC: logical publication via inheritance root?
Previous Message Andrew Dunstan 2023-01-06 21:08:29 Re: [PATCH] Expand character set for ltree labels