Re: Optimizer improvements: to do or not to do?

From: "Say42" <andrews42(at)yandex(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizer improvements: to do or not to do?
Date: 2006-09-13 11:00:03
Message-ID: 1158145203.102440.55670@b28g2000cwb.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:
> But you haven't offered any analysis about the cause of this problem, so any
> speculation about normalization, usual cases, caching effects and so on are
> unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where
exists (select code from belg_mobile tc
where c.bnum >= tc.code and c.bnum like tc.code || '%'
order by tc.code desc limit 1)

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost)
+ ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost)
+ 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery)
~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops)
+ 0.01 (cpu_tuple_cost)) * 68 (record count)
= 1 (pages fetch cost) + 1.19 (cpu cost) = 2.19

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost)
* 500K (conn table row count) ) / 500K ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost)
* 500K (conn table row count) ) / 500K ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2006-09-13 12:26:22 Re: Getting a move on for 8.2 beta
Previous Message Teodor Sigaev 2006-09-13 10:24:40 GIN documentation