Re: Index Scan cost expression

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Scan cost expression
Date: 2009-01-28 11:30:36
Message-ID: 87pri7llb7.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com> writes:

>> Moreover it only models a single index scan. It assumes nothing is cached
>> prior to the index scan which is very much not true if we're repeatedly
>> scanning similar ranges of keys.
>>
>
> It's reasonable to assume that nothing is cached for estimating the cost.

Not really, but doing otherwise is just hard. There's nothing in the query to
give Postgres a hint about which tables are more heavily used and more likely
to be in cache than others.

> 1 block !/O per index probe is a considerable cost.

Well it's closer to 0 than n...

>> Well they're all different but I suspect the root of what you're observing are
>> all the same thing. Cache doesn't affect any of these nodes unless we start
>> with something in the cache from previous queries and we don't model that. We
>> assume each query and even each plan node is run on a cold cache.
>
> Cost of evaluating operators depend heavily on available cache size,
> which is not considered by the Postgre optimizer at many places. For
> instance,
> - # I/O for sorting = T log_M T/M, where T is size of relation, and M
> is available memory.
>
> However, postgre assumes constant avaliable memory of 1M for sorting.
> Since sort is a blocking operator, which means that exeuction of other
> operators should halt when sorting is in progress, it should be able
> to hog all the available memory.

Well you're free to raise work_mem. Also, while sorting all happens in one
step the memory remains used after the sort is done.

It's true that if work_mem is set low but there's lots of cache available it
will speed up spilled hashes and on-disk merge sorts. But you would be better
off raising work_mem in those cases. If they're spilling because they don't
fit in RAM at all then will cache still have an effect?

>> I'm also not clear what kinds of formulas work for this. It has to be
>> something that can be composed effectively. That is, even if a plan node
>> doesn't want to discount itself at all for repetitions it has to include the
>> discount that any subplans have asked for. For example a large sequential scan
>> which expects to overflow effective_cache_size might not want to be discounted
>> at all, but if it has a subquery which scans a small table it will want to
>> discount that 100% for any repetitions since it'll be cached after the first
>> scan.
>
> We also need robust statistics to feed into these complex cost
> expression for accurate estimation. For instance, Oracle lets user to
> analyze each column to create distribution graph using histograms.
> These histograms is used by the optimizer to figure out exact number
> of rows (and their values ) output by an operator.

Well we certainly compute histograms and use them for selectivity estimates.
The challenge in this case is that you need to combine the distribution from
the outer node with the distribution in the inner node to estimate how much
overlap in disk accesses will result. So you need more than histograms, you
need some kind of cross-table statistics.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2009-01-28 11:54:47 Re: pg_upgrade project status
Previous Message Magnus Hagander 2009-01-28 11:19:46 Re: Patch to add Windows 7 support