| From: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> | 
|---|---|
| To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru> | 
| Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Fuzzy substring searching with the pg_trgm extension | 
| Date: | 2016-01-29 15:58:39 | 
| Message-ID: | 56AB8C2F.2080609@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 29.01.2016 18:39, Alvaro Herrera wrote:
> Teodor Sigaev wrote:
>>> The behavior of this function is surprising to me.
>>>
>>> select substring_similarity('dog' ,  'hotdogpound') ;
>>>
>>>   substring_similarity
>>> ----------------------
>>>                   0.25
>>>
>> Substring search was desined to search similar word in string:
>> contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
>>   substring_similarity
>> ----------------------
>>                   0.75
>>
>> contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
>>   substring_similarity
>> ----------------------
>>                      1
>
> Hmm, this behavior looks too much like magic to me.  I mean, a substring
> is a substring -- why are we treating the space as a special character
> here?
>
I think, I can rename this function to subword_similarity() and correct 
the documentation.
The current behavior is developed to find most similar word in a text. 
For example, if we will search just substring (not word) then we will 
get the following result:
select substring_similarity('dog', 'dogmatist');
  substring_similarity
---------------------
                     1
(1 row)
But this is wrong I think. They are completely different words.
For searching a similar substring (not word) in a text maybe another 
function should be added?
-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thom Brown | 2016-01-29 16:01:37 | Re: [WIP] Effective storage of duplicates in B-tree index. | 
| Previous Message | Aleksander Alekseev | 2016-01-29 15:47:33 | Re: [WIP] Effective storage of duplicates in B-tree index. |