Quick Links

Re: Can pg_trgm handle non-alphanumeric characters?

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	MauMau <maumau307(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Euler Taveira <euler(at)timbira(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Can pg_trgm handle non-alphanumeric characters?
Date:	2012-05-11 15:53:57
Message-ID:	CAHGQGwHMru9oYhcPSHr39tU_cnggw7+kX8BJjh6yT4o4_DB2GQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, May 11, 2012 at 4:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> On Fri, May 11, 2012 at 12:07 AM, MauMau <maumau307(at)gmail(dot)com> wrote:
>>> Thanks for your explanation. Although I haven't understood it well yet, I'll
>>> consider what you taught. And I'll consider if the tentative measure of
>>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
>>> against Japanese text.
>
>> In Japanese, it's common to do a text search with two characters keyword.
>> But since pg_trgm is 3-gram, you basically would not be able to use index
>> for such text search. So you might need something like pg_bigm or pg_unigm
>> for Japanese text search.

Even if an index can be used for two characters text search, bitmap index scan
picks up all rows, so it's too slow.

> I believe the trigrams are three *bytes* not three characters. So a
> couple of kanji should work just fine for this.

Really? As far as I read the code of pg_trgm, the trigram is three characters
and its CRC32 is used as an index key if its size is more than three bytes.

Regards,

--
Fujii Masao

In response to

Re: Can pg_trgm handle non-alphanumeric characters? at 2012-05-10 19:11:57 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Florian Pflug	2012-05-11 15:55:24	Re: Gsoc2012 idea, tablesample
Previous Message	Kevin Grittner	2012-05-11 15:50:37	Re: Gsoc2012 idea, tablesample