pg_trgm partial-match

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_trgm partial-match
Date: 2012-11-15 19:39:21
Message-ID: CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'd like to propose to extend pg_trgm so that it can compare a partial-match
query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial'
GIN method for pg_trgm.

Currently, when the query key is less than three characters, we cannot use
a GIN index (+ pg_trgm) efficiently, because pg_trgm doesn't support a
partial-match method. In this case, seq scan or index full scan would be
executed, and its response time would be very slow. I'd like to alleviate this
problem.

Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
i.e., if query key contains multibyte characters. In this case, byte length of
the trigram string might be larger than three, and its CRC is used as a
trigram key instead of the trigram string itself. Because of using CRC, we
cannot do a partial-match. Attached patch extends pg_trgm so that it
compares a partial-match query key only when KEEPONLYALNUM is
enabled.

Attached patch is WIP yet. What I should do next is:

* version up pg_trgm from 1.0 to 1.1, i.e., create pg_trgm--1.1.sql, etc.
* write the regression test

Comments? Review? Objection?

Regards,

--
Fujii Masao

Attachment Content-Type Size
trgm_compare_partial_v0.patch application/octet-stream 5.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-11-15 19:41:30 Re: Dumping an Extension's Script
Previous Message Peter Geoghegan 2012-11-15 19:36:48 Re: tuplesort memory usage: grow_memtuples