On Thu, Nov 11, 2010 at 1:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Let's back up a moment and talk about what the overall goal is, here.
>> Ideally, we would like PostgreSQL to have excellent performance at all
>> times, under all circumstances, with minimal tuning. Therefore, we do
>> NOT want to add variables that will, by design, need constant manual
>> adjustment. That is why I suggested that Tom's idea of an
>> assume_cached GUC is probably not what we really want to do. On the
>> other hand, what I understand Mladen to be suggesting is something
>> completely different. He's basically saying that, of course, he wants
>> it to work out of the box most of the time, but since there are
>> guaranteed to be cases where it doesn't, how about providing some
>> knobs that aren't intended to be routinely twaddled but which are
>> available in case of emergency? Bravo, I say!
> Um ... those are exactly the same thing. You're just making different
> assumptions about how often you will need to twiddle the setting.
> Neither assumption is based on any visible evidence, unfortunately.
> I was thinking of assume_cached as something that could be
> set-and-forget most of the time, and you're entirely right to criticize
> it on the grounds that maybe it wouldn't. But to support a proposal
> that doesn't even exist yet on the grounds that it *would* be
> set-and-forget seems a tad inconsistent. We can't make that judgment
> without a whole lot more details than have been provided yet for any
> idea in this thread.
Well, maybe I misunderstood what you were proposing. I had the
impression that you were proposing something that would *by design*
require adjustment for each query, so evidently I missed the point.
It seems to me that random_page_cost and seq_page_cost are pretty
close to set-and-forget already. We don't have many reports of people
needing to tune these values on a per-query basis; most people seem to
just guesstimate a cluster-wide value and call it good. Refining the
algorithm should only make things better.
> I do think that something based around a settable-per-table caching
> percentage might be a reasonable way to proceed. But the devil is in
> the details, and we don't have those yet.
I think one of the larger devils in the details is deciding how to
estimate the assumed caching percentage when the user hasn't specified
one. Frankly, I suspect that if we simply added a reloption called
assumed_caching_percentage and made it default to zero, we would make
a bunch of DBAs happy; they'd knock down seq_page_cost and
random_page_cost enough to account for the general level of caching
and then bump assumed_caching_percentage up for hot tables/indexes (or
ones that they want to have become hot). I think we can do better
than that, but the right formula isn't exactly obvious. I feel safe
saying that if effective_cache_size=1GB and table_size=4MB, then we
ought to take the table as fully cached. But it's far from clear what
caching percentage we should assume when table_size=400MB, and it
seems like the sort of thing that will lead to endless bikeshedding.
There's probably no perfect answer, but I feel we can likely come up
with something that is better than a constant (which would probably
still be better than what we have now).
The Enterprise PostgreSQL Company
In response to
pgsql-performance by date
|Next:||From: Robert Haas||Date: 2010-11-11 19:05:41|
|Subject: Re: anti-join chosen even when slower than old plan|
|Previous:||From: Tom Lane||Date: 2010-11-11 18:58:49|
|Subject: Re: anti-join chosen even when slower than old plan |