Re: pg_trgm version 1.2

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_trgm version 1.2
Date: 2015-07-20 14:17:49
Message-ID: CAPpHfdskHSDixhh6-95vDefNwUuwTRYwnNs0xfRCE6BtSxUw+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 15, 2015 at 12:31 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>>
>>
>> See Tom Lane's comment about downgrade scripts. I think just remove it is
>> a right solution.
>>
>
> The new patch removes the downgrade path and the ability to install the
> old version.
>
> (If anyone wants an easy downgrade path for testing, they can keep using
> the prior patch--no functional changes)
>
> It also added a comment before the trigramsMatchGraph call.
>
> I retained the palloc and the loop to promote the ternary array to a
> binary array. While I also think it is tempting to get rid of that by
> abusing the type system and would do it that way in my own standalone code,
> it seems contrary to way the project usually does things. And I couldn't
> measure a difference in performance.
>

Ok!

> ....
>
>
>
>> Let's consider '^(?!.*def).*abc' regular expression as an example. It
>> matches strings which contains 'abc' and don't contains 'def'.
>>
>> # SELECT 'abc' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> # SELECT 'def abc' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> f
>> (1 row)
>>
>> # SELECT 'abc def' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> f
>> (1 row)
>>
>> Theoretically, our trigram regex processing could support negative
>> matching of 'def' trigram, i.e. trigramsMatchGraph(abc = true, def = false)
>> = true but trigramsMatchGraph(abc = true, def = true) = false. Actually, it
>> doesn't because trigramsMatchGraph() implements a monotonic function. I
>> just think it should be stated explicitly.
>>
>
> Do you think it is likely to change to stop being monotonic and so support
> the (def=GIN_TRUE) => false case?
>
> ^(?!.*def) seems like a profoundly niche situation. (Although one that
> I might actually start using myself now that I know it isn't just a
> Perl-ism).
>
> It doesn't make any difference to this patch, other than perhaps how to
> word the comments.
>

Yes, it was just about the comments.

I also run few tests on real-life dataset: set of dblp paper titles. As it
was expected, there is huge speedup when pattern is long enough.

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
count
-------
318
(1 row)

Time: 29,114 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
count
-------
318
(1 row)

Time: 26,273 ms

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
count
-------
318
(1 row)

Time: 249,725 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
count
-------
318
(1 row)

Time: 218,627 ms

For me, patch is in the pretty good shape. I'm going to mark it "Ready for
committer".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2015-07-20 14:53:29 Re: First Aggregate Funtion?
Previous Message Heikki Linnakangas 2015-07-20 14:15:02 Re: All-zero page in GIN index causes assertion failure