Re: Fuzzy substring searching with the pg_trgm extension

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fuzzy substring searching with the pg_trgm extension
Date: 2016-01-29 14:15:18
Message-ID: 56AB73F6.7050200@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The behavior of this function is surprising to me.
>
> select substring_similarity('dog' , 'hotdogpound') ;
>
> substring_similarity
> ----------------------
> 0.25
>
Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ;
substring_similarity
----------------------
0.75

contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ;
substring_similarity
----------------------
1
It seems to me that users search words in long string. But I'm agree that more
detailed explanation needed and, may be, we need to change feature name to
fuzzywordsearch or something else, I can't imagine how.

>
> Also, should we have a function which indicates the position in the
> 2nd string at which the most similar match to the 1st argument occurs?
>
> select substring_similarity_pos('dog' , 'hotdogpound') ;
>
> answering: 4
Interesting, I think, it will be useful in some cases.

>
> We could call them <<-> and <->> , where the first corresponds to <%
> and the second to %>
Agree
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2016-01-29 14:20:46 Re: Fuzzy substring searching with the pg_trgm extension
Previous Message Petr Jelinek 2016-01-29 14:11:21 Re: Sequence Access Method WIP