Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jan Przemysław Wójcik <jan(dot)przemyslaw(dot)wojcik(at)gmail(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Date: 2017-12-11 20:45:38
Message-ID: CAPpHfdtXJp0xvi8QbcHWqnrk=XyyMux4FtbJCgZFPknPbLERVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Dec 8, 2017 at 2:50 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
>> <jan(dot)przemyslaw(dot)wojcik(at)gmail(dot)com> wrote:
>> > I'm afraid that creating a function that implements quite different
>> > algorithms depending on a global parameter seems very hacky and would
>> lead
>> > to misunderstandings. I do understand the need of backward
>> compatibility,
>> > but I'd opt for the lesser evil. Perhaps a good idea would be to change
>> the
>> > name to 'substring_similarity()' and introduce the new function
>> > 'word_similarity()' later, for example in the next major version
>> release.
>>
>> That breaks things for everybody using word_similarity() currently.
>> If the previous discussion of this topic concluded that
>> word_similarity() was an OK name despite being a slight misnomer, I
>> don't think we should change our mind now. Instead the new function
>> can be called something which makes the difference clear, e.g.
>> strict_word_similarity(), and the old function can remain as it is.
>
>
> +1
> Thank you for pointing this. Yes, it would be better not to change
> existing names and behavior, but adjust documentation and add alternative
> behavior with another name.
> Therefore, I'm going to provide patchset of two patches:
> 1) Improve word_similarity() documentation.
> 2) Add new function strict_word_similarity() (or whatever better name we
> invent).
>

Please, find patchset attached.

0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement
to documentation of word_similarity() and related operators. I decided to
give formal definition first (what exactly it internally does), and then
example and some more human-understandable description. This patch also
adjusts two comments where lower and upper bounds mess up.

0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-pg-trgm-word-similarity-docs-improvement.patch application/octet-stream 4.2 KB
0002-pg-trgm-strict_word-similarity.patch application/octet-stream 85.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message dennis.noordsij 2017-12-11 22:15:27 BUG #14966: Related to #14702 / corruption in replication
Previous Message nonmint 2017-12-11 16:51:50 BUG #14965: PGAdmin 2.0 fails to launch after restart

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-11 20:52:01 Re: Inconsistency in plpgsql's error context reports
Previous Message Andres Freund 2017-12-11 20:25:12 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager