Re: anti-join chosen even when slower than old plan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-11 19:02:14
Message-ID: AANLkTi=t7Oe9bSjK_apn_V2jMGu5Hp3b4iix4te2WER9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Nov 11, 2010 at 1:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Let's back up a moment and talk about what the overall goal is, here.
>> Ideally, we would like PostgreSQL to have excellent performance at all
>> times, under all circumstances, with minimal tuning.  Therefore, we do
>> NOT want to add variables that will, by design, need constant manual
>> adjustment.  That is why I suggested that Tom's idea of an
>> assume_cached GUC is probably not what we really want to do.   On the
>> other hand, what I understand Mladen to be suggesting is something
>> completely different.  He's basically saying that, of course, he wants
>> it to work out of the box most of the time, but since there are
>> guaranteed to be cases where it doesn't, how about providing some
>> knobs that aren't intended to be routinely twaddled but which are
>> available in case of emergency?  Bravo, I say!
>
> Um ... those are exactly the same thing.  You're just making different
> assumptions about how often you will need to twiddle the setting.
> Neither assumption is based on any visible evidence, unfortunately.
>
> I was thinking of assume_cached as something that could be
> set-and-forget most of the time, and you're entirely right to criticize
> it on the grounds that maybe it wouldn't.  But to support a proposal
> that doesn't even exist yet on the grounds that it *would* be
> set-and-forget seems a tad inconsistent.  We can't make that judgment
> without a whole lot more details than have been provided yet for any
> idea in this thread.

Well, maybe I misunderstood what you were proposing. I had the
impression that you were proposing something that would *by design*
require adjustment for each query, so evidently I missed the point.
It seems to me that random_page_cost and seq_page_cost are pretty
close to set-and-forget already. We don't have many reports of people
needing to tune these values on a per-query basis; most people seem to
just guesstimate a cluster-wide value and call it good. Refining the
algorithm should only make things better.

> I do think that something based around a settable-per-table caching
> percentage might be a reasonable way to proceed.  But the devil is in
> the details, and we don't have those yet.

I think one of the larger devils in the details is deciding how to
estimate the assumed caching percentage when the user hasn't specified
one. Frankly, I suspect that if we simply added a reloption called
assumed_caching_percentage and made it default to zero, we would make
a bunch of DBAs happy; they'd knock down seq_page_cost and
random_page_cost enough to account for the general level of caching
and then bump assumed_caching_percentage up for hot tables/indexes (or
ones that they want to have become hot). I think we can do better
than that, but the right formula isn't exactly obvious. I feel safe
saying that if effective_cache_size=1GB and table_size=4MB, then we
ought to take the table as fully cached. But it's far from clear what
caching percentage we should assume when table_size=400MB, and it
seems like the sort of thing that will lead to endless bikeshedding.
There's probably no perfect answer, but I feel we can likely come up
with something that is better than a constant (which would probably
still be better than what we have now).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-11-11 19:05:41 Re: anti-join chosen even when slower than old plan
Previous Message Tom Lane 2010-11-11 18:58:49 Re: anti-join chosen even when slower than old plan