Re: The science of optimization in practical terms?

From: decibel <decibel(at)decibel(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, jd(at)commandprompt(dot)com, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The science of optimization in practical terms?
Date: 2009-02-17 16:14:56
Message-ID: 00BD3A7D-CBEC-4B26-A23E-A0C83B8EB63D@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb 15, 2009, at 9:54 PM, Robert Haas wrote:
> On Sun, Feb 15, 2009 at 1:16 PM, Greg Smith <gsmith(at)gregsmith(dot)com>
> wrote:
>> On Fri, 13 Feb 2009, Robert Haas wrote:
> This seems plausible, but I'm not totally sold: predicting the
> contents of the operating system buffer cache sounds like it might be
> pretty touch. And do we even need to go that far? I'm kind of
> wondering whether we might be able to leverage the information that
> the statistics collector already gathers for this purpose - in
> particular, the information on blocks fetched and read. That might
> not exactly model the current contents of the buffer cache, but it's
> certainly a measure of popularity, and that may be all we really need.
> We're not going to invalidate every plan in the system on every
> buffer eviction, so plans have to be based not so much on what is in
> the buffer cache right now but on what we have a reasonable
> expectation of finding there in the typical case.
>
> Consider, for example, the degenerate (but not necessarily uncommon)
> case where the entire database can fit within shared_buffers, or
> perhaps shared_buffers + OS cache. ISTM we're going to want to plan
> as if the entire database is in cache all the time, even though that
> might not always be true - right after restart, for example.

The shared_buffers + OS cache example is a reason why simply
examining shared_buffers isn't likely to work well; in that case it
definitely would not reflect reality. Though, really in that case we
should be able to simply look at eff_cache_size as well as the size
of the database and understand everything should be in memory.

Actually, a simple algorithm that might work really well would be to
calculate relation cache odds as ( number of page accesses for
relation / number of page accesses for all relations ) * ( sum
(relpages)*BLKSZ / eff_cache_size ), where number of page accesses
would be both from relcache and not. One thing this doesn't address
though is the report from a few months ago that accessing small
tables is still faster with an index scan, even if we know the whole
thing is in cache (I don't remember if that was ever resolved...)

Another idea would be to look at an efficient way to measure how long
it actually takes to pull data from the OS. This has been suggested
in the past, but the idea there was to measure every block access,
and the concern was the overhead of the timing calls. But what if we
sampled instead? Or, what if we read multiple blocks at one time in
the cases where we knew we'd need to (seqscan and an index scan
needing more than one tuple). Another option would by an async IO
process that is responsible for measuring this stuff; I believe some
people have played with async IO and gotten good results.

Of course, on dtrace platforms we could just plug into dtrace...

> You might also run into
> problems with relations that have "hot" segments that are accessed
> frequently and stay cached, and "cold" segments that are never
> touched: if 20% of the relation is in cache, but that's the only 20%
> of the relation we ever access, then our hit rate will be 100% rather
> than 20%.

Yes, but that would be accurate :)

In reality, I think we need to re-visit the idea of evaluating how
close a chosen query plan is matching reality as we're running. If we
thought we'd be seeing a 100% hit rate but in reality it's much lower
we could re-plan (of course this probably only makes sense for
queries that take many seconds).

> But even a primitive algorithm would probably be a lot better than
> what we have now. I'm guessing that there are a lot of databases where
> either the whole database fits in cache, or a decent chunk of
> relatively small core relations fit in cache and then there are some
> big or infrequently-used ones that don't.

--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marco Colombo 2009-02-17 16:17:40 Re: Good Delimiter for copy command
Previous Message Tom Lane 2009-02-17 15:57:58 Re: [BUGS] BUG #4660: float functions return -0