Re: Fuzzy substring searching with the pg_trgm extension

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fuzzy substring searching with the pg_trgm extension
Date: 2016-02-10 15:34:11
Message-ID: 56BB5873.1020503@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02.02.2016 15:45, Artur Zakirov wrote:
> On 01.02.2016 20:12, Artur Zakirov wrote:
>>
>> I have changed the patch:
>> 1 - trgm2.data was corrected, duplicates were deleted.
>> 2 - I have added operators <<-> and <->> with GiST index supporting. A
>> regression test will pass only with the patch
>> http://www.postgresql.org/message-id/CAPpHfdt19FwQXarYjkzxb3oxmv-KAn3FLuZrooARE_U3H3CV9g@mail.gmail.com
>>
>>
>> 3 - the function substring_similarity() was renamed to
>> subword_similarity().
>>
>> But there is not a function substring_similarity_pos() yet. It is not
>> trivial.
>>
>
> Sorry, in the previous patch was a typo. Here is the fixed patch.
>

I have attached a new version of the patch. It fixes error of operators
<->> and %>:
- operator <->> did not pass the regression test in CentOS 32 bit (gcc
4.4.7 20120313).
- operator %> did not pass the regression test in FreeBSD 32 bit (gcc
4.2.1 20070831).

It was because of variable optimization by gcc.

In this patch pg_trgm documentation was corrected. Now operators were
wrote as %> and <->> (not <% and <<->).

There is a problem in adding the substring_similarity_pos() function. It
can bring additional overhead. Because we need to store characters
position including spaces in addition. Spaces between words are lost in
current implementation.
Does it actually need?

In conclusion, this patch introduces:
1 - functions:
- subword_similarity()
2 - operators:
- %>
- <->>
3 - GUC variables:
- pg_trgm.sml_limit
- pg_trgm.subword_limit

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
pg_trgm_guc_v2.patch text/x-patch 8.8 KB
pg_trgm_subword_v7.patch text/x-patch 112.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-02-10 15:37:13 Re: extend pgbench expressions with functions
Previous Message Robert Haas 2016-02-10 15:21:11 Re: [PATCH] Refactoring of LWLock tranches