Re: Trigram (pg_trgm) GIN index not used

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Ivan Voras <ivoras(at)freebsd(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Trigram (pg_trgm) GIN index not used
Date: 2013-02-21 14:23:34
Message-ID: CAHyXU0xZWHhHQjPjXKKsbS2PSX+O=+Wa-p-pw49varaOoWCo_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 21, 2013 at 6:06 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
> On 21/02/2013 12:52, Ivan Voras wrote:
>
>> I'd like to use pg_trgm for matching substrings case-insensitively, but
>> it doesn't seem to use the index:
>
>
> As a sort-of followup, the '%' operator kind of works but takes
> incredibly long time, and the selectivity estimates are completely wrong:
>
> nn=> vacuum analyze documents;
> VACUUM
> nn=> explain select id,title from documents where raw_data % 'zagreb';
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------
> Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108)
> Recheck Cond: ((raw_data)::text % 'zagreb'::text)
> -> Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40
> rows=54 width=0)
> Index Cond: ((raw_data)::text % 'zagreb'::text)
> (4 rows)
>
> nn=> explain analyze select id,title from documents where raw_data %
> 'zagreb';
> QUERY
> PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108)
> (actual time=98750.283..98750.283 rows=0 loops=1)
> Recheck Cond: ((raw_data)::text % 'zagreb'::text)
> -> Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40
> rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1)
> Index Cond: ((raw_data)::text % 'zagreb'::text)
> Total runtime: 98750.623 ms
> (5 rows)
>
>
> There is no IO load during this query.

pg_trgm is not really designed for indexing large documents, but for
fuzzy simple string (company name, address, etc) matching. probably
better off with full text search.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2013-02-21 15:10:47 Re: subselects vs WITH in views
Previous Message Ioana Danes 2013-02-21 12:28:20 Ioana Danes