pg_trgm version 1.2

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_trgm version 1.2
Date: 2015-06-27 22:17:33
Message-ID: CAMkU=1woR_Pdmie6d-zj6sDOPiHd_iUe3vZSXFGe_i4-AQYsJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This patch implements version 1.2 of contrib module pg_trgm.

This supports the triconsistent function, introduced in version 9.4 of the
server, to make it faster to implement indexed queries where some keys are
common and some are rare.

I've included the paths to both upgrade and downgrade between 1.1 and 1.2,
although after doing so you must close and restart the session before you
can be sure the change has taken effect. There is no change to the on-disk
index structure

This shows the difference it can make in some cases:

create extension pg_trgm version "1.1";

create table foo as select

md5(random()::text)|| case when random()<0.000005 then 'lmnop' else '123'
end ||

md5(random()::text) as bar

from generate_series(1,10000000);

create index on foo using gin (bar gin_trgm_ops);

--some queries

alter extension pg_trgm update to "1.2";

--close, reopen, more queries

select count(*) from foo where bar like '%12344321lmnabcddd%';

V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache

V1.2: Time: 2.839 ms --- after repeated execution to warm the cache

You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in
core) from 4 to some higher value (which it probably should be anyway, but
there will always be a case where it needs to be higher than you can afford
it to be, so a real solution is needed).

I wasn't sure if this should be a new version of pg_trgm or not, because
there is no user visible change other than to performance. But there may
be some cases where it results in performance reduction and so it is nice
to provide options. Also, I'd like to use it in a back-branch, so versions
seems to be the right way to go there.

There is a lot of code duplication between the binary consistent function
and the ternary one. I thought it the duplication was necessary in order
to support both 1.1 and 1.2 from the same code base.

There may also be some gains in the similarity and regex cases, but I
didn't really analyze those for performance.

I've thought about how to document this change. Looking to other example
of other contrib modules with multiple versions, I decided that we don't
document them, other than in the release notes.

The same patch applies to 9.4 code with a minor conflict in the Makefile,
and gives benefits there as well.

Cheers,

Jeff

Attachment Content-Type Size
pg_trgm_1_2_v001.patch application/octet-stream 12.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2015-06-27 23:13:08 Re: pg_file_settings view vs. Windows
Previous Message Oskari Saarenmaa 2015-06-27 20:13:27 Re: Solaris testers wanted for strxfrm() behavior