Re: pg_trgm version 1.2

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_trgm version 1.2
Date: 2015-06-30 09:46:53
Message-ID: CAPpHfdtUfkY7Hv+7dLQPYFWrW1xrUb1jvPrdwL84eyGN6y1E5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> This patch implements version 1.2 of contrib module pg_trgm.
>
> This supports the triconsistent function, introduced in version 9.4 of the
> server, to make it faster to implement indexed queries where some keys are
> common and some are rare.
>

Thank you for the patch! Lack of pg_trgm triconsistent support was
significant miss after "fast scan" implementation.

> I've included the paths to both upgrade and downgrade between 1.1 and 1.2,
> although after doing so you must close and restart the session before you
> can be sure the change has taken effect. There is no change to the on-disk
> index structure
>

pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you
expect them in final commit? As I can see in other contribs we have only
last version and upgrade scripts.

This shows the difference it can make in some cases:

>
> create extension pg_trgm version "1.1";
>
> create table foo as select
>
> md5(random()::text)|| case when random()<0.000005 then 'lmnop' else
> '123' end ||
>
> md5(random()::text) as bar
>
> from generate_series(1,10000000);
>
> create index on foo using gin (bar gin_trgm_ops);
>
> --some queries
>
> alter extension pg_trgm update to "1.2";
>
> --close, reopen, more queries
>
>
> select count(*) from foo where bar like '%12344321lmnabcddd%';
>
>
>
> V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache
>
> V1.2: Time: 2.839 ms --- after repeated execution to warm the cache
>

Nice!

> You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in
> core) from 4 to some higher value (which it probably should be anyway, but
> there will always be a case where it needs to be higher than you can afford
> it to be, so a real solution is needed).
>

Actually, it depends on how long it takes to calculate consistent function.
The cheaper consistent function is, the higher MAX_MAYBE_ENTRIES could be.
Since all functions in PostgreSQL may define its cost, GIN could calculate
MAX_MAYBE_ENTRIES from the cost of consistent function.

I wasn't sure if this should be a new version of pg_trgm or not, because
> there is no user visible change other than to performance. But there may
> be some cases where it results in performance reduction and so it is nice
> to provide options. Also, I'd like to use it in a back-branch, so versions
> seems to be the right way to go there.
>

We definitely have to stamp a new version of extension every time when its
SQL-definition changes. By the current design of GIN, providing both
consistent and triconsistent function should be optimal for performance. If
it's not so then it's a problem of GIN, not opclass.

There is a lot of code duplication between the binary consistent function
> and the ternary one. I thought it the duplication was necessary in order
> to support both 1.1 and 1.2 from the same code base.
>

For me current code duplication is OK.

There may also be some gains in the similarity and regex cases, but I
> didn't really analyze those for performance.
>

> I've thought about how to document this change. Looking to other example
> of other contrib modules with multiple versions, I decided that we don't
> document them, other than in the release notes.
>
>
> The same patch applies to 9.4 code with a minor conflict in the Makefile,
> and gives benefits there as well.
>

Some other notes about the patch:
* You allocate boolcheck array and don't use it.
* Check coding style and formatting, in particular "check[i]==GIN_TRUE"
should be "check[i] == GIN_TRUE".
* I think some comments needed in gin_trgm_triconsistent() about
trigramsMatchGraph(). gin_trgm_triconsistent() may use trigramsMatchGraph()
that way because trigramsMatchGraph() implements monotonous boolean
function.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-30 09:48:59 Re: Refactor to split nodeAgg.c?
Previous Message Andres Freund 2015-06-30 09:38:26 Re: anole: assorted stability problems