From: | Thomas Kellerer <spam_eater(at)gmx(dot)net> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Bloom index cost model seems to be wrong |
Date: | 2019-02-12 15:08:25 |
Message-ID: | dc54f6e8-c6bf-6869-08c2-de855de470ae@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
I stumbled upon this question:
https://dba.stackexchange.com/questions/229413
in a nutshell: the bloom index is not used with the example from the manual.
The bloom index is only used if either Seq Scan is disabled or if the random_page_cost is set to 1 (anything about 1 triggers a Seq Scan on my Windows laptop).
If parallel execution is disabled, then the bloom index is only used if the random_page_cost is lower than 4.
This does not use the index:
set random_page_cost = 4;
set max_parallel_workers_per_gather=0;
explain (analyze, buffers)
select *
from tbloom
where i2 = 898732
and i5 = 123451;
This uses the bloom index:
set random_page_cost = 3.5;
set max_parallel_workers_per_gather=0;
explain (analyze, buffers)
select *
from tbloom
where i2 = 898732
and i5 = 123451;
And this uses the index also:
set random_page_cost = 1;
explain (analyze, buffers)
select *
from tbloom
where i2 = 898732
and i5 = 123451;
This is the plan with when the index is used (either through "enable_seqscan = off" or "random_page_cost = 1")
Bitmap Heap Scan on tbloom (cost=138436.69..138440.70 rows=1 width=24) (actual time=42.444..42.444 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Index Recheck: 2400
Heap Blocks: exact=2365
Buffers: shared hit=21973
-> Bitmap Index Scan on bloomidx (cost=0.00..138436.69 rows=1 width=0) (actual time=40.756..40.756 rows=2400 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Buffers: shared hit=19608
Planning Time: 0.075 ms
Execution Time: 42.531 ms
And this is the plan when everything left at default settings:
Seq Scan on tbloom (cost=0.00..133695.80 rows=1 width=24) (actual time=1220.116..1220.116 rows=0 loops=1)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 10000000
Buffers: shared hit=4697 read=58998
I/O Timings: read=354.670
Planning Time: 0.075 ms
Execution Time: 1220.144 ms
Can this be considered a bug in the cost model of the bloom index implementation?
Or is it expected that this is only used if random access is really cheap?
Thomas
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-02-12 15:11:55 | Re: use Getopt::Long for catalog scripts |
Previous Message | Michael Meskes | 2019-02-12 14:06:11 | Re: [PROPOSAL]a new data type 'bytea' for ECPG |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-02-12 15:41:58 | Re: Bloom index cost model seems to be wrong |
Previous Message | Christoph Berg | 2019-02-12 12:00:18 | Re: Performance regressions found using sqlfuzz |