Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Date: 2013-05-31 16:48:19
Message-ID: CAD21AoDBK+LKUKm9atWWYHpoP7SAJOLA+FEKBpbGaSKH_kNiXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 31, 2013 at 11:16 AM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Fri, May 31, 2013 at 4:25 AM, Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
>> On Thu, May 30, 2013 at 12:49 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>>>
>>> following emails are discussed about partial match of pg_trgm. I hope
>>> will this help.
>>>
>>> <http://www.postgresql.org/message-id/CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com>
>>> as you may know, if search string contains multibyte characters
>>> trigram key is converted to CRC of 4 byte and it is used as key.
>>> (but only use upper 3 byte from CRC)
>>> so we can do partial matching if KEEPONLYALNUM is enabled.
>>
>>
>> Please, read the further discussion on that thread. We can't do partial
>> matching because of CRC independently of KEEPONLYALNUM.
>>
>
> Also, a few more questions:
>
> 1) When building a trgm index, are there any differences for
> multi-byte character strings. For example, would a 2 character
> Japanese string (multi-byte offcourse) produce exactly 3 trigrams to
> be stored in the index which would later be used while look-up?

in above case a 2 character multibyte string produce 3 trigrams of
CRC. (because these was larger than 3 byte)
and these are used while look-up.

> 2) And if that is so, is there problem in gin_extract_query_trgm(),
> that is while generating trigrams from a query search term that causes
> trigrams (stored in the index if answer to (1) is yes) NOT to be used
> in such a partial matching case?

it means that we can't use trigrams in case of partial matching
because trigrams (stored in index) are converted to different
value(CRC).
right?

Regards,
--
-------
Sawada Masahiko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-05-31 17:14:13 Re: removing PD_ALL_VISIBLE
Previous Message Heikki Linnakangas 2013-05-31 11:12:03 Re: Freezing without write I/O