Re: pg_trgm word_similarity inconsistencies or bug

From: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Cristiano Coelho <cristianocca(at)hotmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_trgm word_similarity inconsistencies or bug
Date: 2017-10-28 08:22:29
Message-ID: 20171028082225.GA2157@arthur.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
> Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
>
> But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
>
> Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
>

Interesting. An klin's answer from stackoverflow.com is right.

The initial example can be reduced to the next:

=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1

It computes maximum similarity using closest trigrams not considering order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.

Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'

Maybe the problem should be solved by considering 'sage' trigrams
initial order.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Henri KY 2017-10-30 11:14:16 Re: BUG #14874: Dublicate values in primary key
Previous Message Tom Lane 2017-10-28 03:56:57 Re: BUG #14874: Dublicate values in primary key

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-28 09:13:20 Re: unsafe tuple releasing in get_default_partition_oid
Previous Message Julien Rouhaud 2017-10-28 08:03:10 unsafe tuple releasing in get_default_partition_oid