Re: Erroneous cost estimation for nested loop join

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: kawamichi(at)tkl(dot)iis(dot)u-tokyo(dot)ac(dot)jp, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Erroneous cost estimation for nested loop join
Date: 2015-11-16 23:50:56
Message-ID: CAMkU=1z-3JCPDv8LBYqP7XRin6WWjgvZh9OLjhusCvUjC-nS_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 9, 2015 at 6:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> kawamichi(at)tkl(dot)iis(dot)u-tokyo(dot)ac(dot)jp writes:
>> - cost parameter calibration: random_page_cost = 92.89
>
> TBH, you lost me there already. I know of no hardware on which that would
> be a sane depiction of reality, so I think you've probably overfitted the
> model to some particular case it was already inaccurate on.

I can easily get a ratio of random to sequential of 50, and my RAID is
nothing special. I don't see why a high-end RAID couldn't justify 100
or more, as long as they know the cache-hit is very low. (The default
value of 4 seems to bake in the notion that 90% of even random IO is
going to be satisfied from the cache, which might be a good estimate
if you have frequently used smallish lookup tables that always get
joined to the RAM-busters, but some people aren't going to have that
type of database queries as their main load).

With the current code, a single scan out of several can get estimated
to have a higher cost than just a free-standing single scan
(loop_count > 1), and I don't see how that can ever make sense.

Right now it can only benefit from assumed cache hits (driven by
effective_cache_size) via Mackert and Lohman.

I think that, at least, it should get to claim the greater of either
the Mackert and Lohman benefit between inner scans, or the benefit of
converting some random IO to sequential within each separate inner
scan.

And really, I don't see why it should not get both benefits. If the
pages are still in cache between inner scans, that's great. But if
the one time they do need to be read in from disk they are read in
mostly sequentially, why is that benefit not also fully justified? I
don't see where the worry about "double-counting the cache effects"
comes from. The effective_cache_size and io read-ahead can both apply
and both give benefits simultaneously and cumulatively.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-11-16 23:52:30 Re: check for interrupts in set_rtable_names
Previous Message Robert Haas 2015-11-16 23:47:19 Re: [DESIGN] ParallelAppend