First implementation of GIN for pg_trgm

From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Cc: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Subject: First implementation of GIN for pg_trgm
Date: 2007-02-22 00:00:09
Message-ID: 1d4e0c10702211600v7e0761c7ja533b949f6f79cad@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Hi all,

Here is my preliminary work on porting pg_trgm to GIN. pg_trgm can be
a very good addition to tsearch2 to suggest spellings for mispelled
words as explained in the README.pg_trgm file and I'd like to use it
in this case. GIST implementation is a bit slow so I tried to port it
to use GIN.

The attached patch is the first working implementation. It's not final
but I would like some feedback on how to fix the remaining problems.

From a previous discussion with Teodor, it would be better to store an
int in the index instead of a text (it takes less space and is
faster). I couldn't find any example so if anyone has an advice to fix
that, it's welcome (mostly how to pack the trigram into an int instead
of a text).

The last problem is that similarity calculated in the GIN index is
higher than the one with GIST so I have to set the trgm_limit quite
high to have decent results (a limit of 0.8 instead of 0.3 seems to be
quite good).
AFAICS, it comes from the fact that I couldn't find any way to get the
length of the indexed trigram which is taken into account with GIST so
we're not exactly filtering the results in the same way.
Does anyone have an idea on how to fix this point?

Thanks for your attention.

--
Guillaume

Attachment Content-Type Size
pg_trgm_gin2.diff text/plain 5.8 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2007-02-22 00:28:04 Re: [previously on HACKERS] "Compacting" a relation
Previous Message Bruce Momjian 2007-02-21 22:47:57 Re: [previously on HACKERS] "Compacting" a relation