Re: wildcard search support for pg_trgm

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wildcard search support for pg_trgm
Date: 2011-01-24 15:34:54
Message-ID: AANLkTik7sL1jWt-6Dg0C2s6ozj=QDXW9BcKzUtcoQwcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Mon, Jan 24, 2011 at 3:07 AM, Jan Urbański <wulczer(at)wulczer(dot)org> wrote:

> I see two issues with this patch. First of them is the resulting index
> size. I created a table with 5 copies of
> /usr/share/dict/american-english in it and a gin index on it, using
> gin_trgm_ops. The results were:
>
> * relation size: 18MB
> * index size: 109 MB
>
> while without the patch the GIN index was 43 MB. I'm not really sure
> *why* this happens, as it's not obvious from reading the patch what
> exactly is this extra data that gets stored in the index, making it more
> than double its size.
>
Do you sure that you did comparison correctly? The sequence of index
building and data insertion does matter. I tried to build gin index on 5
copies of /usr/share/dict/american-english with patch and got 43 MB index
size.

> That leads me to the second issue. The pg_trgm code is already woefully
> uncommented, and after spending quite some time reading it back and
> forth I have to admit that I don't really understand what the code does
> in the first place, and so I don't understand what does that patch
> change. I read all the changes in detail and I could't find any obvious
> mistakes like reading over array boundaries or dereferencing
> uninitialized pointers, but I can't tell if the patch is correct
> semantically. All test cases I threw at it work, though.
>
I'll try to write sufficient comment and send new revision of patch.

> This patch changes the names and signatures of some support functions
> for GIN, and I'm not sure how that affects binary compatibility and
> pg_upgrade. I tried to create an index with the vanilla source, and then
> recompile pg_trgm and reindex the table, but it still was not using the
> index. I think it's because it's missing entries in the catalogs about
> the index supporting the like strategy. How should this be handled?
>
This patch don't alters structure of index. It only adds strategies for
index scan. In order update this index one should recreate operator class
(it will require to drop index). It can be done by
sequential uninstall_pg_trgm.sql and pg_trgm.sql. After that new index can
be created and it will support like strategy. Although actually there is no
need of index recreation, I don't see easier way to do this.

----
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Krogh 2011-01-24 16:14:11 Re: wildcard search support for pg_trgm
Previous Message Magnus Hagander 2011-01-24 15:33:29 Re: Include WAL in base backup