Re: Question about (probably wrong) index scan cost for conditional indexes

From: Alex Lai <alai(at)sesda2(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Question about (probably wrong) index scan cost for conditional indexes
Date: 2012-02-01 19:21:45
Message-ID: 4F2990C9.7020901@sesda2.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
>
>> I know there is issue with statistics over intarrays (it was there
>> very long time and sometime it's complicating things a lot).
>>
>
>
>> However, the 100x cost difference between:
>> SELECT * from test order by id limit 100; (over "primary key (id)" btree index)
>> Limit (cost=0.00..3.43 rows=100 width=37)
>> vs
>> SELECT * from test where sections && '{2}' order by value limit 100;
>> (over "test_value_in2section_key on test(value) where sections &&
>> '{2}'" btree index)
>> Limit (cost=0.00..539.29 rows=100 width=37)
>> seems wrong for me.
>>
>
> [ shrug... ] It really is the fault of the bad rowcount estimate.
> The cost estimates for the complete indexscans are reasonable. However,
> in one case you've got LIMIT thinking that it will fetch the first 100
> out of 1000000 index entries, so it divides the indexscan cost estimate
> by 10000, and gets something reasonably accurate. In the other case,
> LIMIT thinks it's going to fetch the first 100 out of 1000 index
> entries, so it divides the indexscan cost estimate by 10, and comes out
> with something not very accurate. If the rowcount estimate for && had
> been correct, those numbers would be much more alike.
>
>
>> Both queries performs the absolutely same task: fetch 100 entries from
>> the table based on the ideally suitable index (no post
>> processing/filtering were done at all... just return 100 sorted tuples
>> based on single index scan).
>>
>
> Well, you know that and I know that, but the exposed cost and rowcount
> estimates for the IndexScan plan node imply something else entirely:
> that the cost-per-tuple-fetched is a lot higher in the one case than
> the other. The LIMIT estimator has no reason, or method, to second
> guess that.
>
>
>> And even if I drop the intarray index completely, than I still have a
>> wrong plan (bitmap scan + sort), because planner cost for the index
>> scan over conditional index 100 more the it should be.
>> (e.g. there is still an issue even in absence of the intarray index).
>>
>
> Yeah, because it's not about the index, it's about the selectivity of
> the && operator. That estimate is wrong regardless of whether there
> are any indexes involved.
>
>
>> Is absence of frequency statistics over intarrays somehow linked to
>> the wrong planner cost estimates for conditional index scan?
>>
>
> Well, we lack both the statistics and an operator selectivity function
> that would know what to do with them. Just a small matter of
> programming ...
>
> regards, tom lane
>
>
Tom,

I had the same situation in one of my query.
Use the subquery can speed up almost by 100 times faster.

explain analyse
select FileId as "FileId", ESDT as "ESDT",1 as "Position" from
V_FileMeta_L3
where Archiveset = 61000 and ESDT= 'ESDT123'
and Source = 'SOURCE1234'
and (
(StartTime between '2012-01-28
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
and '2012-01-28
07:41:27.000000Z'::timestamp + '100 days'::interval)
or
(EndTime between '2012-01-28
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
and '2012-01-28
07:41:27.000000Z'::timestamp + '100 days'::interval)
)
order by starttime
limit 1;

Limit (cost=0.00..15.20 rows=1 width=22) (actual time=200.048..200.048
rows=1 loops=1)
-> Nested Loop (cost=0.00..117596.32 rows=7736 width=22) (actual
time=200.046..200.046 rows=1 loops=1)
-> Index Scan using ak_filemeta_l3_esdt_starttime_endtime on
filemeta_l3 b (cost=0.00..77200.55 rows=7736 width=22) (actual
time=199.986..199.989
rows=2 loops=1)
Index Cond: ((esdt)::text = 'ROLPT'::text)
Filter: (((source)::text = 'OMPS-NPP'::text) AND
(((starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND
(starttime <= '2012-
05-07 07:41:27'::timestamp without time zone)) OR ((endtime >=
'2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <=
'2012-05-07 07:41:27'::ti
mestamp without time zone))))
-> Index Scan using pk_filemeta_archiveset on
filemeta_archiveset a (cost=0.00..5.21 rows=1 width=4) (actual
time=0.025..0.025 rows=0 loops=2)
Index Cond: ((a.fileid = b.fileid) AND (a.archiveset =
61000))
Total runtime: 200.102 ms
(8 rows)

explain analyse
select FileId as "FileId", ESDT as "ESDT",1 as "Position" from
V_FileMeta_L3
where FileId in (select fileid from V_FileMeta_L3
where Archiveset = 61000 and ESDT= 'ESDT123'
and Source = 'SOUCE1234'
and (
(StartTime between '2012-01-28
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
and '2012-01-28
07:41:27.000000Z'::timestamp + '100 days'::interval)
or
(EndTime between '2012-01-28
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
and '2012-01-28
07:41:27.000000Z'::timestamp + '100 days'::interval)
)
order by starttime)
limit 1;

Limit (cost=53038.95..53039.41 rows=1 width=14) (actual
time=2.502..2.502 rows=1 loops=1)
-> Nested Loop (cost=53038.95..6661196.32 rows=14611464 width=14)
(actual time=2.480..2.480 rows=1 loops=1)
-> Nested Loop (cost=53038.95..53716.80 rows=2549631 width=18)
(actual time=2.464..2.464 rows=1 loops=1)
-> HashAggregate (cost=53038.95..53040.95 rows=200
width=4) (actual time=2.445..2.445 rows=1 loops=1)
-> Sort (cost=52923.05..52942.37 rows=7727
width=12) (actual time=2.346..2.372 rows=43 loops=1)
Sort Key: b.starttime
Sort Method: quicksort Memory: 27kB
-> Nested Loop (cost=209.87..52424.05
rows=7727 width=12) (actual time=0.358..2.251 rows=43 loops=1)
-> Bitmap Heap Scan on filemeta_l3 b
(cost=209.87..12077.66 rows=7727 width=12) (actual time=0.262..0.685
rows=108 loops=1)
Recheck Cond: ((((esdt)::text =
'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp
without time zone) AND (starttime <= '2012-05-07 07:41:2
7'::timestamp without time zone)) OR (((esdt)::text = 'ROLPT'::text) AND
(endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND
(endtime <= '2012-05-07 07:41:27'::timestamp without
time zone)))
Filter: ((source)::text =
'OMPS-NPP'::text)
-> BitmapOr (cost=209.87..209.87
rows=9895 width=0) (actual time=0.223..0.223 rows=0 loops=1)
-> Bitmap Index Scan on
ak_filemeta_l3_esdt_starttime_endtime (cost=0.00..107.72 rows=4961
width=0) (actual time=0.127..0.127 rows=106 loops=1)
Index Cond:
(((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28
08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-0
7 07:41:27'::timestamp without time zone))
-> Bitmap Index Scan on
ak_filemeta_l3_esdt_endtime (cost=0.00..98.29 rows=4934 width=0)
(actual time=0.093..0.093 rows=108 loops=1)
Index Cond:
(((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28
08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07
:41:27'::timestamp without time zone))
-> Index Scan using
pk_filemeta_archiveset on filemeta_archiveset a (cost=0.00..5.21 rows=1
width=4) (actual time=0.011..0.011 rows=0 loops=108)
Index Cond: ((a.fileid = b.fileid)
AND (a.archiveset = 61000))
-> Index Scan using pk_filemeta_l3 on filemeta_l3 b
(cost=0.00..3.37 rows=1 width=14) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (b.fileid = a.fileid)
-> Index Scan using ak_filemeta_archiveset_fileid on
filemeta_archiveset a (cost=0.00..2.52 rows=6 width=4) (actual
time=0.012..0.012 rows=1 loops=1)
Index Cond: (a.fileid = b.fileid)
Total runtime: 2.711 ms

Hope this help.

Best Regards,
Alex Lai

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robert Buckley 2012-02-01 20:07:37 Re2:
Previous Message Guillaume Lelarge 2012-02-01 17:38:21 Re: Server not starting problem