Quick Links

Re: Fuzzy substring searching with the pg_trgm extension

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fuzzy substring searching with the pg_trgm extension
Date:	2016-02-11 12:30:44
Message-ID:	56BC7EF4.2030903@sigaev.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>>> The behavior of this function is surprising to me.
>>>
>>> select substring_similarity('dog' , 'hotdogpound') ;
>>>
>>> substring_similarity
>>> ----------------------
>>> 0.25
>>>
>> Substring search was desined to search similar word in string:
>> contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ;
>> substring_similarity
>> ----------------------
>> 0.75
>>
>> contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ;
>> substring_similarity
>> ----------------------
>> 1
>
> Hmm, this behavior looks too much like magic to me. I mean, a substring
> is a substring -- why are we treating the space as a special character
> here?

Because it isn't a regex for substring search. Since implementing, pg_trgm
works over words in string.
contrib_regression=# select similarity('block hole', 'hole black');
similarity
------------
0.571429
contrib_regression=# select similarity('block hole', 'black hole');
similarity
------------
0.571429

It ignores spaces between words and word's order.

I agree, that substring_similarity is confusing name, but actually it search
most similar word in second arg to first arg and returns their similarity.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Re: Fuzzy substring searching with the pg_trgm extension at 2016-01-29 15:39:51 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thom Brown	2016-02-11 12:40:56	max_parallel_degree context level
Previous Message	Pavel Stehule	2016-02-11 10:29:55	Re: [patch] Proposal for \crosstabview in psql