Re: not using my GIN index in JOIN expression

From: Jean-Max Reymond <jmreymond(at)free(dot)fr>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: not using my GIN index in JOIN expression
Date: 2014-02-27 14:46:29
Message-ID: 530F4FC5.9040100@free.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Le 27/02/2014 15:19, Heikki Linnakangas a écrit :
> On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
>> I am running the last version of PostgreSQL 9.3.3
>> I have two tables detm and corm and a lot of datas in the column
>> cormdata of corm table (1.4 GB).
>>
>> I have a GIN index on cormdata:
>> CREATE INDEX ix_corm_fulltext_cormdata ON corm
>> USING gin (to_tsvector('french'::regconfig, cormdata))
>> WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
>>
>> select distinct b.detmmailid from corm b where
>> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
>> b.cormishtml is false and length(b.cormdata) < 20000)
>> is very fast and use the GIN index.
>>
>> "HashAggregate (cost=2027.72..2031.00 rows=328 width=52)"
>> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548
>> width=52)"
>> " Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
>> (length(cormdata) < 20000))"
>> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> " Index Cond: (to_tsvector('french'::regconfig, cormdata)
>> @@ to_tsquery('mauritanie'::text))"
>>
>>
>> With a join an another table detm, GIN index is not used
>>
>>
>> explain select distinct a.detmmailid from detm a JOIN corm b on
>> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000) OR ( detmobjet ~* 'mauritanie' ))
>>
>> "HashAggregate (cost=172418.27..172423.98 rows=571 width=52)"
>> " -> Hash Join (cost=28514.92..172416.85 rows=571 width=52)"
>> " Hash Cond: (b.detmmailid = a.detmmailid)"
>> " Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
>> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
>> " -> Seq Scan on corm b (cost=0.00..44755.07 rows=449507
>> width=689)"
>> " -> Hash (cost=19322.74..19322.74 rows=338574 width=94)"
>> " -> Seq Scan on detm a (cost=0.00..19322.74 rows=338574
>> width=94)"
>>
>>
>> If I remove OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
>> index is used
>> explain select distinct a.detmmailid from detm a JOIN corm b on
>> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000))
>>
>> "HashAggregate (cost=4295.69..4301.17 rows=548 width=52)"
>> " -> Nested Loop (cost=24.67..4294.32 rows=548 width=52)"
>> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548
>> width=52)"
>> " Recheck Cond: ((to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
>> AND (length(cormdata) < 20000))"
>> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> " Index Cond: (to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text))"
>> " -> Index Only Scan using pkey_detm on detm a (cost=0.42..4.13
>> rows=1 width=52)"
>> " Index Cond: (detmmailid = b.detmmailid)"
>>
>> How can i force the use of the GIN index ?
>> thanks for your tips,
>
> The problem with the OR detmobject ~* 'mauritanie' restriction is that
> the rows that match that condition cannot be found using the GIN index.
> I think you'd want the system to fetch all the rows that match the other
> condition using the GIN index, and do something else to find the other
> rows. The planner should be able to do that if you rewrite the query as
> a UNION:
>
> select a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid
> where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
> b.cormishtml is false and length(b.cormdata) < 20000)
> union
> select a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid
> where detmobjet ~* 'mauritanie'
>
> Note that that will not return rows in 'detm' that have no matching rows
> in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
> condition. That's what your original query also did, but if that's not
> what you want, leave out the JOIN from the second part of the union.
>
> - Heikki

It works great: thanks a lot :-)

--
Jean-Max Reymond
CKR Solutions Open Source http://www.ckr-solutions.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Coogan 2014-02-27 16:57:42 Inefficient filter order in query plan
Previous Message Heikki Linnakangas 2014-02-27 14:19:55 Re: not using my GIN index in JOIN expression