Re: GIN improvements part2: fast scan

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN improvements part2: fast scan
Date: 2014-03-11 22:09:24
Message-ID: 531F8994.3040008@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

a quick question that just occured to me - do you plan to tweak the cost
estimation fot GIN indexes, in this patch?

IMHO it would be appropriate, given the improvements and gains, but it
seems to me gincostestimate() was not touched by this patch.

I just ran into this while testing some jsonb stuff, and after creating
a GIN and GIST indexes on the same column, I get these two plans:

=======================================================================

db=# explain analyze select count(*) from messages_2 where headers ?
'x-virus-scanned';

QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=1068.19..1068.20 rows=1 width=0) (actual
time=400.149..400.150 rows=1 loops=1)
-> Bitmap Heap Scan on messages_2 (cost=10.44..1067.50 rows=278
width=0) (actual time=27.974..395.840 rows=70499 loops=1)
Recheck Cond: (headers ? 'x-virus-scanned'::text)
Rows Removed by Index Recheck: 33596
Heap Blocks: exact=40978
-> Bitmap Index Scan on messages_2_gist_idx (cost=0.00..10.37
rows=278 width=0) (actual time=21.762..21.762 rows=104095 loops=1)
Index Cond: (headers ? 'x-virus-scanned'::text)
Planning time: 0.052 ms
Total runtime: 400.179 ms
(9 rows)

Time: 400,467 ms

db=# drop index messages_2_gist_idx;
DROP INDEX

db=# explain analyze select count(*) from messages_2 where headers ?
'x-virus-scanned';
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=1083.91..1083.92 rows=1 width=0) (actual
time=39.130..39.130 rows=1 loops=1)
-> Bitmap Heap Scan on messages_2 (cost=26.16..1083.22 rows=278
width=0) (actual time=11.285..36.248 rows=70499 loops=1)
Recheck Cond: (headers ? 'x-virus-scanned'::text)
Heap Blocks: exact=23896
-> Bitmap Index Scan on messages_2_gin_idx (cost=0.00..26.09
rows=278 width=0) (actual time=7.974..7.974 rows=70499 loops=1)
Index Cond: (headers ? 'x-virus-scanned'::text)
Planning time: 0.064 ms
Total runtime: 39.160 ms
(8 rows)

Time: 39,509 ms

=======================================================================

So while the GIN plans seems to be just slightly expensive than GIN,
it's actually way faster.

Granted, most won't have GIN and GIST index on the same column at the
same time, but bad cost estimate may cause other issues. Maybe I could
achieve this by tweaking the various cost GUCs, but ISTM that tweaking
the cost estimation would be appropriate.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-03-11 22:58:30 Re: jsonb and nested hstore
Previous Message David Johnston 2014-03-11 20:41:03 Re: The case against multixact GUCs