Re: anti-join chosen even when slower than old plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-11 19:35:56
Message-ID: 19796.1289504156@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Yeah. For Kevin's case, it seems like we want the caching percentage
> to vary not so much based on which table we're hitting at the moment
> but on how much of it we're actually reading.

Well, we could certainly take the expected number of pages to read and
compare that to effective_cache_size. The thing that's missing in that
equation is how much other stuff is competing for cache space. I've
tried to avoid having the planner need to know the total size of the
database cluster, but it's kind of hard to avoid that if you want to
model this honestly.

Would it be at all workable to have an estimate that so many megs of a
table are in cache (independently of any other table), and then we could
scale the cost based on the expected number of pages to read versus that
number? The trick here is that DBAs really aren't going to want to set
such a per-table number (at least, most of the time) so we need a
formula to get to a default estimate for that number based on some simple
system-wide parameters. I'm not sure if that's any easier.

BTW, it seems that all these variants have an implicit assumption that
if you're reading a small part of the table it's probably part of the
working set; which is an assumption that could be 100% wrong. I don't
see a way around it without trying to characterize the data access at
an unworkably fine level, though.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-11-11 19:41:50 Re: anti-join chosen even when slower than old plan
Previous Message Kevin Grittner 2010-11-11 19:22:59 Re: anti-join chosen even when slower than old plan