On Fri, Feb 20, 2009 at 7:25 PM, decibel <decibel(at)decibel(dot)org> wrote:
> On Feb 17, 2009, at 11:23 PM, Robert Haas wrote:
>>> Actually, a simple algorithm that might work really well would be to
>>> calculate relation cache odds as ( number of page accesses for relation /
>>> number of page accesses for all relations ) * ( sum(relpages)*BLKSZ /
>>> eff_cache_size ), where number of page accesses would be both from
>>> and not.
>> I don't think that formula makes any sense. If effective_cache_size
>> is in the denominator, then increasing it will make the odds of
>> finding the page in cache go down.
> Yes, sorry... I got that part of the equation upside-down. It should be:
> ( number of page accesses for relation / number of page accesses for all
> relations ) * ( eff_cache_size / sum(relpages)*BLKSZ )
Well, that makes more sense, but it's still not right. Suppose I have
ten equal-sized relations whose total size is equal to
effective_cache_size. Relations 1-5 each get 15% of the page accesses
and relations 6-10 each get 5% of the page accesses. Under your
formula, relations 1-5 will be 150% in cache and relations 6-10 will
be 50% in cache. In reality, assuming sufficient frequency of access,
100% of all ten relations will be in cache.
I don't think there's any way to do this that doesn't involve some
sort of iterative process. What you need to do is compute (# of page
accesses for this relation / number of page accesses for all
relations) * effective_cache_size, dole out that amount of cache to it
(capped at 100% of the relation size), and then decrement
effective_cache_size by the amount of cache you doled out and the
number of page accesses by the number for that relation, and then
rerun for the second-most-popular relation.
For example, suppose (in the example above) that effective_cache_size
= 1GB and there are 10K page accesses total.
Relation 1: MAX(1.5K/10K * 1GB, 100MB) = MAX(150MB, 100MB) = 100MB
Relation 2: MAX(1.5K/8.5K * 900MB, 100MB) = MAX(159MB, 100MB) = 100MB
Relation 3: MAX(1.5K/7K * 800MB, 100MB) = MAX(171MB, 100MB) = 100MB
Relation 4: MAX(1.5K/5.5K * 700MB, 100MB) = MAX(190MB, 100MB) = 100MB
Relation 5: MAX(1.5K/4K * 600MB, 100MB) = MAX(225MB, 100MB) = 100MB
Relation 6: MAX(0.5K/2.5K * 500MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 7: MAX(0.5K/2.0K * 400MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 8: MAX(0.5K/1.5K * 300MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 9: MAX(0.5K/1.0K * 200MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 10: MAX(0.5K/0.5K * 100MB, 100MB) = MAX(100MB, 100MB) = 100MB
>>> One thing this doesn't address though is the report from a few
>>> months ago that accessing small tables is still faster with an index
>>> even if we know the whole thing is in cache (I don't remember if that was
>>> ever resolved...)
>> I'm not sure if this is what you're referring to, but there was a
>> relatively recent post on, I believe, -performance, where a bitmap
>> index scan that hit almost the entire table beat out a seqscan. I
>> don't think there was any further discussion and I'm still mystified
>> as to how it's possible.
> What I was thinking of was that when dealing with a very small table (one or
> maybe a few pages), the planner thinks that a seqscan is the fastest way to
> get a single row, but it's actually faster to use an index. The bitmap case
> is even more interesting. Something is seriously screwy with small seqscans
> it seems.
Do you have a good test case for this? I'd like to poke at it. It's
really just because the planner thinks that accessing the index pages
will cost a disk read, which is often false in practice. Does it help
if you set random_page_cost = seq_page_cost = 0.2?
The case I mentioned is qualitatively different because not only is
the planner wrong, but the observed behavior is somewhat inexplicable.
I have a feeling this may have to do with the fact that bitmap
indices can identify individual tuples on the page when the tbm is
non-lossy. Consulting the index (which is almost free if the page is
already in shared_buffers) not only finds the right page, but lets you
skip the CPU overhead of testing the quals against the irrelevant
tuples on that page. But we need to do some better testing here to
figure out what is really going on.
In response to
pgsql-hackers by date
|Next:||From: Frank Featherlight||Date: 2009-02-21 13:26:39|
|Subject: Service not starting: Error 1053|
|Previous:||From: Andrew Dunstan||Date: 2009-02-21 01:19:05|
|Subject: Re: parallel restore|