Re: anti-join chosen even when slower than old plan

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-13 05:44:25
Message-ID: AANLkTi=rNOWUaJMgmSSK8O6ueGpQvKbbBrXoiW4+2heA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2010/11/12 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain
>> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>>>> I wondering if we could do something with a formula like 3 *
>>>> amount_of_data_to_read / (3 * amount_of_data_to_read +
>>>> effective_cache_size) = percentage NOT cached.  That is, if we're
>>>> reading an amount of data equal to effective_cache_size, we assume 25%
>>>> caching, and plot a smooth curve through that point.  In the examples
>>>> above, we would assume that a 150MB read is 87% cached, a 1GB read is
>>>> 50% cached, and a 3GB read is 25% cached.
>
>>> But isn't it already the behavior of effective_cache_size usage ?
>
>> No.
>
> I think his point is that we already have a proven formula
> (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air.
> The problem is to figure out what numbers to apply the M-L formula to.
>
> I've been thinking that we ought to try to use it in the context of the
> query as a whole rather than for individual table scans; the current
> usage already has some of that flavor but we haven't taken it to the
> logical conclusion.
>
>> The ideal of trying to know what is actually in cache strikes me as an
>> almost certain non-starter.
>
> Agreed on that point.  Plan stability would go out the window.

Point is not to now the current cache, but like for ANALYZE on a
regular basis (probably something around number of page read/hit) run
a cache_analyze which report stats like ANALYZE do, and may be
adjusted per table like auto_analyze is.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2010-11-13 05:53:27 Re: MVCC performance issue
Previous Message Jon Nelson 2010-11-13 03:31:55 Re: temporary tables, indexes, and query plans