Re: not using my GIN index in JOIN expression

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jean-Max Reymond <jmreymond(at)free(dot)fr>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: not using my GIN index in JOIN expression
Date: 2014-02-27 14:19:55
Message-ID: 530F498B.6010004@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
> I am running the last version of PostgreSQL 9.3.3
> I have two tables detm and corm and a lot of datas in the column
> cormdata of corm table (1.4 GB).
>
> I have a GIN index on cormdata:
> CREATE INDEX ix_corm_fulltext_cormdata ON corm
> USING gin (to_tsvector('french'::regconfig, cormdata))
> WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
>
> select distinct b.detmmailid from corm b where
> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
> b.cormishtml is false and length(b.cormdata) < 20000)
> is very fast and use the GIN index.
>
> "HashAggregate (cost=2027.72..2031.00 rows=328 width=52)"
> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548 width=52)"
> " Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
> (length(cormdata) < 20000))"
> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata
> (cost=0.00..24.11 rows=548 width=0)"
> " Index Cond: (to_tsvector('french'::regconfig, cormdata)
> @@ to_tsquery('mauritanie'::text))"
>
>
> With a join an another table detm, GIN index is not used
>
>
> explain select distinct a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
> to_tsquery('mauritanie') and b.cormishtml is false and
> length(b.cormdata) < 20000) OR ( detmobjet ~* 'mauritanie' ))
>
> "HashAggregate (cost=172418.27..172423.98 rows=571 width=52)"
> " -> Hash Join (cost=28514.92..172416.85 rows=571 width=52)"
> " Hash Cond: (b.detmmailid = a.detmmailid)"
> " Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
> " -> Seq Scan on corm b (cost=0.00..44755.07 rows=449507
> width=689)"
> " -> Hash (cost=19322.74..19322.74 rows=338574 width=94)"
> " -> Seq Scan on detm a (cost=0.00..19322.74 rows=338574
> width=94)"
>
>
> If I remove OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
> index is used
> explain select distinct a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
> to_tsquery('mauritanie') and b.cormishtml is false and
> length(b.cormdata) < 20000))
>
> "HashAggregate (cost=4295.69..4301.17 rows=548 width=52)"
> " -> Nested Loop (cost=24.67..4294.32 rows=548 width=52)"
> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548
> width=52)"
> " Recheck Cond: ((to_tsvector('french'::regconfig,
> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
> AND (length(cormdata) < 20000))"
> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata
> (cost=0.00..24.11 rows=548 width=0)"
> " Index Cond: (to_tsvector('french'::regconfig,
> cormdata) @@ to_tsquery('mauritanie'::text))"
> " -> Index Only Scan using pkey_detm on detm a (cost=0.42..4.13
> rows=1 width=52)"
> " Index Cond: (detmmailid = b.detmmailid)"
>
> How can i force the use of the GIN index ?
> thanks for your tips,

The problem with the OR detmobject ~* 'mauritanie' restriction is that
the rows that match that condition cannot be found using the GIN index.
I think you'd want the system to fetch all the rows that match the other
condition using the GIN index, and do something else to find the other
rows. The planner should be able to do that if you rewrite the query as
a UNION:

select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
union
select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where detmobjet ~* 'mauritanie'

Note that that will not return rows in 'detm' that have no matching rows
in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
condition. That's what your original query also did, but if that's not
what you want, leave out the JOIN from the second part of the union.

- Heikki

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jean-Max Reymond 2014-02-27 14:46:29 Re: not using my GIN index in JOIN expression
Previous Message Jean-Max Reymond 2014-02-27 14:06:56 not using my GIN index in JOIN expression