Re: Fixing GIN for empty/null/full-scan cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Abe Ingersoll <abe(at)abe(dot)us>
Subject: Re: Fixing GIN for empty/null/full-scan cases
Date: 2011-01-18 23:46:48
Message-ID: 6785.1295394408@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> No, I see no reason to think that has much to do with it. I'm wondering
> if your table is itself a bit bloated ...

Actually ... I notice you did not show EXPLAIN ANALYZE output for your
tests. Now I'm wondering whether you tested the right thing at all.
I got burnt that way too. Observe:

regression=# create index idx_gin_features on listings using gin(features) WHERE deleted_at IS NULL AND status = 1;
CREATE INDEX
regression=# explain analyze SELECT count(*) FROM listings
WHERE features @@ '(1368799&1368800&1369043)'::query_int
AND deleted_at IS NULL AND status = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9158.24..9158.25 rows=1 width=0) (actual time=153.633..153.634 rows=1 loops=1)
-> Seq Scan on listings (cost=0.00..9157.22 rows=406 width=0) (actual time=0.048..153.493 rows=772 loops=1)
Filter: ((deleted_at IS NULL) AND (features @@ '1368799 & 1368800 & 1369043'::query_int) AND (status = 1))
Total runtime: 153.713 ms
(4 rows)

regression=# set enable_seqscan TO 0;
SET
regression=# explain analyze SELECT count(*) FROM listings
WHERE features @@ '(1368799&1368800&1369043)'::query_int
AND deleted_at IS NULL AND status = 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=13253.42..13253.43 rows=1 width=0) (actual time=331.990..331.990 rows=1 loops=1)
-> Bitmap Heap Scan on listings (cost=4095.18..13252.40 rows=406 width=0) (actual time=164.785..331.858 rows=772 loops=1)
Recheck Cond: ((deleted_at IS NULL) AND (status = 1))
Filter: (features @@ '1368799 & 1368800 & 1369043'::query_int)
-> Bitmap Index Scan on idx_gin_features (cost=0.00..4095.07 rows=406215 width=0) (actual time=164.045..164.045 rows=406215 loops=1)
Total runtime: 332.169 ms
(6 rows)

The above is "using" the index, but only as a guide to where the rows
satisfying the partial-index predicate are --- note the lack of any
index condition in the indexscan node. That's because the query_int
query is not in fact compatible with the core-provided index opclass.
We get much better results using intarray's gin__int_ops opclass:

regression=# drop index idx_gin_features;
DROP INDEX
regression=# create index idx_gin_features on listings using gin(features gin__int_ops) WHERE deleted_at IS NULL AND status = 1;
CREATE INDEX
regression=# explain analyze SELECT count(*) FROM listings
WHERE features @@ '(1368799&1368800&1369043)'::query_int
AND deleted_at IS NULL AND status = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1159.20..1159.21 rows=1 width=0) (actual time=23.896..23.896 rows=1 loops=1)
-> Bitmap Heap Scan on listings (cost=31.15..1158.18 rows=406 width=0) (actual time=22.912..23.813 rows=772 loops=1)
Recheck Cond: ((features @@ '1368799 & 1368800 & 1369043'::query_int) AND (deleted_at IS NULL) AND (status = 1))
-> Bitmap Index Scan on idx_gin_features (cost=0.00..31.05 rows=406 width=0) (actual time=22.811..22.811 rows=772 loops=1)
Index Cond: (features @@ '1368799 & 1368800 & 1369043'::query_int)
Total runtime: 24.036 ms
(6 rows)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simone Aiken 2011-01-18 23:49:29 Re: ToDo List Item - System Table Index Clustering
Previous Message David E. Wheeler 2011-01-18 23:45:15 Re: Fixing GIN for empty/null/full-scan cases